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Preface 



Science is what we understand well enough to explain to a computer. 

Donald E. Knuth 



Computing facilities - like work stations and PCs, with high speed and large memory 
are becoming increasingly and easily available to researchers and students since recent 
times. This is particularly true in our country; the growth has been rapid and is most 
likely to become more rapid in the coming years. As a result, computer simulation 
methods are gaining in importance and popularity as viable and useful tools for research 
and education. 

Monte Carlo is an early and an important computer simulation technique. Named 
after the city (in the province of Monoco, south of Prance) famous for its (gambling) 
casinos, the Monte Carlo method makes extensive use of random numbers. It is em- 
ployed for simulation of a large variety of phenomena in very many different disciplines. 
However, in this monograph, 1 shall confine myself to elucidating the technique for sim- 
ulation of statistical physics systems. 

We begin with a quick look at what do we mean by ensembles in general and 
Gibbs ensemble in particular. We discuss briefly micro canonical ensemble that models 
isolated system, canonical ensemble that describes closed system and grand canonical 
ensemble, useful in the study of open system. In the following few sections I present 
briefly the basic ingredients of the method of Monte Carlo. These include discussions 
on random numbers, generation and testing of a sequence of pseudo random numbers, 
random sampling from different distributions, importance sampling and statistical error 
bars from uncorrelated and correlated data. I have tried to keep the presentation as 
simple as possible. 

These preliminary discussions are followed by a reasonably detailed description of 
the Monte Carlo methods for simulating a canonical ensemble of microstates of a clas- 
sical statistical mechanics system. The topics covered are: Markov chains. Metropolis 
algorithm, Ising model, continuous phase transition, critical exponents, finite size scal- 
ing, n-fold way, critical slowing down, cluster algorithms, cluster counting, percolation, 
histogram techniques, supercritical slowing down, multicanonical /entropic sampling, 
Wang-Landau algorithm and Jarzynski's equality relating nonequilibrium work done 
to equilibrium free energies. I have tried to present these topics the way I have seen 
them, the way I have learnt them and the way I am teaching them. 

My first word of thanks goes to Prof. A. K. Bhatnagar, the Central University 
of Hyderabad and presently the vice chancelor of the Pondichery University. Prof. 
Bhatnagar met me during a conference in Kurukshetra and spoke of his plans to start 
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a post-graduate course on computations in the school of physics at the Hyderabad 
University. He asked of me if I would be willing to handle the portions on Monte Carlo 
and Molecular dynamics. 1 readily agreed. On his invitation I went to Hyderabad 
during November and December 1998 and gave lectures on Monte Carlo and Molecular 
dynamics to the first batch of students of the M. Tech. Computational physics pro- 
gramme. On my request Prof. V. S. S. Sastri, Prof. K. Venu, Prof. Bansal and Prof. 
Srivatsav attended all the lectures and gave me valuable criticisms and suggestions. 
1 take this opportunity to say thanks to them and the students of the course. Prof. 
Bhatnagar invited me again during January-February 2000 for giving the same course 
to the second batch of students. While writing this monograph I have drawn heavily 
from the notes I prepared for the Hyderabad courses. I thank Prof. Bhatnagar for giv- 
ing me this wonderful opportunity to teach and to learn. I requested Prof. Bhatnagar 
to give a foreword to this monograph and he readily agreed. 

Subsequently I got opportunities to speak on Monte Carlo theory and practice in 
several places some of which are listed below. 

Guru Ghasidas University, Bilaspur (tutorial lecture at the Solid State Physics Sym- 
posium: 26-30, December 2000), 

Madurai Kamaraj University, Madurai (Seminar on Computer Simulations in Physics, 
4-5, February 2002), 

Calicut University, Calicut (Refresher course for the college teachers, 25-30 Novem- 
ber 2002), 

Bharathidasan University, Tiruchirappalli (UGC refresher course in Physics, 10-30 
December 2002), 

Annamalai University, Annamalai Nagar (UGC sponsored refresher course, 19 De- 
cember 2002 - 8 January 2003), 

and 

the Materials Science Division, Indira Gandhi Centre for Atomic Research, Kalpakkam 
(to several doctoral students, post doctoral students and colleagues) for more than 
a decade. 

I thank M. Ramanadham (Mumbai), K. Ramachandran (Madurai), P. Remesan 
(Calicut), K. Ramamurthi (Tiruchirappalli) and A. N. Kannappan (Annamalai Nagar) 
for the invitations. 

I am thankful to V. Sridhar (Kalpakkam), R. Indira (Kalpakkam), V. S. S. Sastri 
(Hyderabad), K. Venu (Hyderabad), M. C. Valsakumar (Kalpakkam), S. L. Narasimhan 
(Mumbai), K. Satyavathi (Hyderabad), Karl- Heinz Herrmann (Jiilich, Kalpakkam), 
Giancarlo Franzes (Italy), S. Sivakumar (Kalpakkam), R. Harish (Kalpakkam), Jean- 
Feng Gwan (Taiwan, Jiilich) and Vishal Mehra (New Delhi, Jiilich) for criticisms and 
suggestions. 

I owe a special word of thanks to my friend Dr. A. Natarajan (Kalpakkam); Dr. 
Natarajan is a man with very big heart and is genuinely conccerned with lives and 
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careers of his coUegues and juniors; he always finds time to actively involve himself with 
new ventures of his colegues and encourage them in their work. Dr. Natarajan read 
through the entire manuscript carefully and critically; he marked several corrections in 
the manuscript and suggested revisions in several places to improve the readability. I 
am extremely thankful to Dr. A. Natarajan. 

I have intended this monograph only as an introduction to this vast and rapidly 
growing field. The aim is to help you obtain a feel for the method and an appreciation 
for the nuances in the different techniques including the recent ones. I have tried to be 
simple in the presentation, up-to-date in the coverage of several and important topics, 
and exhaustive in the bibliography. 1 hope these notes shall become a useful addition 
to the bookshelves of researchers and students. 



Kalpakkam 

9 November 2003 



K. P. N. 
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1 Prologue 



A macroscopic system consists of a very large number of microscopic constituents. A 
gas in a container is a simple example. The microscopic constituents are the molecules 
of the gas. The number of molecules N is typically of the order of 10^^. Each molecule 
requires three position and three momentum coordinates for complete specification in 
classical mechanics. The entire macroscopic system at any given instant of time can 
thus be specified by a string of 6A^ numbers, which defines, in a sense, a microstate. In 
a 6A^ dimensional phase space, the system is represented by a point; or more precisely, 
by a phase space element of volume h^^ (owing its origin to our quantum mechanical 
inability to specify precisely the position and its conjugate momentum simultaneously). 
Here, h is the Planck's constant. ^ It is clear that the number of microstates associated 
with the system is extremely large. The (equilibrium) system is switching from one 
microstate to the other all the time. In a classical picture, the point representing 
the system traces a trajectory in the 6A^ dimensional phase space as time proceeds. 
Keeping track of 6N numbers as a function of time is neither feasible nor practical. A 
statistical approach would be helpful. 

Any experimentally measured equilibrium macroscopic property is a time averaged 
quantity : averaged over the phase space trajectory traced by the macroscopic system 
during the observation time. Let us assume that during the experimental observation 
time, the system visits all the (typical?) microstates, consistent with the constraints 
on volume, number of particles, energy etc. This is called ergodicity assumption. This 
assumption is fine since the observation time scale is very large compared to the time 
scale over which the system switches from one microstate to the other. We can then 
equate the 'experimental' time average to an average over a suitable static Gibbs 
ensemble of microstates. For such a scheme to be successful we must, at the outset, 
recognize that a macroscopic property is basically statistical in nature. For example, 
pressure is the average momentum transferred by the molecules colliding with the walls 
of the container; entropy is the logarithm of the number of microstates accessible to 
the system; temperature is average energy; specific heat is a manifestation of energy 
fluctuations; etc. 

When in equilibrium, the macroscopic properties of a system do not change with 
time; hence a macroscopic property can be defined as a time average over an underlying 
(stationary) stochastic process or equivalently an average over a suitably defined Gibbs 
ensemble; the associated fluctuations are inversely proportional to the square root of 
the system size and hence are usually negligibly small. This is directly a consequence 
of the Central Limit Theorem: the distribution of the sum of N independent and finite 
variance random variables converges to a Gaussian in the limit N oo, and has a 
variance proportional to N y]. ^ Thus the standard deviation of the arithmetic mean 
of these random variables is inversely proportional to Physically this implies 

that the fluctuations of a macroscopic property from its average is extremely small and 
inversely proportional to the size (number of microscopic constituents) of the system. 

^The Planck's constant h = 6.626 x lO"^** Joules. second or 4.136 x 10~^^ electron volts. second. 

^For an account of the rate of convergence of the sum to a Gaussian random variable see For 
an elegant demonstration of the Central Limit Theorem employing renormalization and scaling ideas, 
see P]- 
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Clearly it is the largeness of the number of microscopic constituents that gives rise to 
a certain robustness to the macroscopic behaviour of the system. 

1.1 What is an ensemble? What is a Gibbs ensemble? 

To understand the notion of an ensemble consider an experiment which has more than 
one outcome. Let us collect all the possible outcomes of the experiment in a set called 

the sample space Q = {oui : i — 1, 2, ■ ■ ■ , Cl}, where Ui denotes an outcome and Cl the 
number of outcomes of the experiment. Let P{uJi) > be the probability of the outcome 
Ui. Wc have Ylf^i P{uJi) = 1. Thus the sample space and the probability P define 
the experiment. Physicists however use a single notion of an ensemble, introduced 
by Maxwell and denoted by Qe- The members of the ensemble are drawn from the 
sample space Q. However an out come uji appears in the ensemble n{LJi) times so that 
n{uJi)/ClE = P{uJi), where CIe is the size of the ensemble; Qe is taken adequately large 
so that all outcomes of the experiment appear in the ensemble in correct proportions. 
We do not specify any rule for the construction of an ensemble; we can employ Monte 
Carlo (subject of this monograph) or Molecular Dynamics or for that matter any other 
technique to construct an ensemble. 

A Gibbs ensemble contains microstates of an equilibrium macroscopic system under 
the macroscopic constraints operating on the system. Depending on the nature of the 
constraints we have different Gibbs ensembles. 

1.1.1 Isolated system: Microcanonical ensemble 

Consider an isolated macroscopic equilibrium system. The constraints are on : Energy 
(E), Volume (V) and Number of particles (N). Collect all possible microstates of the 
system into a set Qis, called the sample space. All microstates of an isolated system 
are equally probable. ^ Also, because of the constraints, all the microstates are of 
the same energy and volume and have the same number of particles. Let 0{C) be a 
macroscopic property of the system when in its microstate C. Formally the average of 
O is given by 

(o) = i E o{c) (1) 

i% C e His 

where Qis is the total number of microstates of the system. In fact the entropy of the 
system is given hj S = /cBlnl^jg, where is the Boltzmann constant. Thus the 
probability P{C) for an isolated system to be in a microstate C is l/l^is or exp(— ^//cb), 
and is the same for all the microstates. 

Equivalently, we can construct a microcanonical ensemble f2^cE of microstates; a 
microstate C G flis occurs in l^^cE as often as to reflect its probability P{C). In other 
words, the number of times the microstate C occurs in the microcanonical ensemble 
^(uCE divided by the size of the ensemble (denoted by I^^ce) equals P{C). For an 
isolated system all microstates belonging to Qis occur in equal proportions in JI^ce- 

^This is an axiom; the entire edifice of statistical mechanics is built on this axiom. 
^The Boltzmann constant fce equals 1.381 x 10~^^ Joules per degree Kelvin or 8.671 x 10~^ electron 
volts per degree Kelvin. 
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1.1.2 Closed system: Canonical ensemble 

Consider now an equilibrium macroscopic system in thermal contact with a heat bath 
at temperature say T. Only the volume (V) and the number of particles (N) are fixed. 
The energy is not fixed. Instead the temperature of the system equals that of the heat 
bath when in equilibrium. This is a closed system: the system exchanges energy with 
the heat bath but not particles. Let flcs denote the set of all possible microstates 
of the closed system at temperature T. f2cs is called the sample space of the closed 
system; since the system can exchange energy with the heat bath, different microstates 
belonging to Qcs can have different energies. ^ Also the probability with which a 
closed system can be found in its microstate C is proportional to exp[—i3E{C)] where 
l3 = 1/(^3^). The probability of occurrence of a microstate C in a closed system 
depends on the energy of the microstate and the temperature of the system. Thus to 
each C G flcs we attach a Boltzmann weight exp[—f3E{C)]. The sum of the Boltzmann 
weights taken over all the microstates belonging to Qqs is called the canonical partition 
function Z{P), given by, 

Z{P,V,N) = J2 exp [-/?E(C)] . (2) 
C e fics 

Let 0(C) be a macroscopic property of the closed system when in its microstate C. 
The average of O is formally given by. 

To calculate {0{j3)), we can take an alternate approach. Let ^ce denote a canonical 
ensemble of microstates belonging to the closed system. A microstate C G Qcs occurs 
in the ensemble ^ce as often as to reflect its probability -P(C) = exp[—/3E{C)]/Z{P, V, N). 
Thus the number of times a microstate C G Qqs occurs in the canonical ensemble ^ce 
divided by the size, CIqe of the ensemble is given by P{C). The advantage of construct- 
ing a canonical ensemble is that a macroscopic property can be calculated as a simple 
arithmetic mean, 

{om = J- E 0(c). (4) 

"CE c e QcE 

The size CIqe of the canonical ensemble is usually taken to be large so that even those 
microstates with very low probability are present in the ensemble in right proportions. 
We shall follow the convention of denoting an element of a sample space by script 
letters, e.g. C and that of an ensemble by capital letters e.g. C. 

1.1.3 Open system: Grand canonical ensemble 

We can proceed further and consider an open system in which there is an exchange 
of energy as well as particles with the surrounding bath. The constraints are on the 

''There can be many microstates having the same energy E. Logarithm of this number muhiphed 
by /cb shall be called suggestively as microcanonical entropy S{E). We shall find this terminology 
useful when we consider entropic/multicanonical sampling in sections (|23f) and l|24|l . 
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temperature (T), volume (V) and chemical potential (/i). Let Qqs denote the sample 
space. The open system can be found in a microstate C with a probability proportional 
to the Gibbs factor given by exp[—i3E{C) + ^(3N{C)], where N{C) is the number of 
particles in the open system when it is in its microstate C. Thus to each microstate 
C we attach a Gibbs weight exp[— /3i?(C) + /i/3iV(C)]. The sum of the Gibbs weights 
taken over all the microstates of the open system is called the grand canonical partition 
function, 

= exp[-/3E(C)+/i/3iV(C)] . (5) 

C e Hos 

The average of a macroscopic property O is formally given by, 

(0(/3,V^,/x)) = ^ E 0(C)exp[-/3E(C)+/i/3iV(C)]. (6) 

We can calculate the average by constructing a grand canonical ensemble ^gce of 
microstates; the number of times a microstate C, belonging to fios occurs in VLqcb 
divided by the size of ^gcb (denoted by I^gce) is given by exp[— /5i?(C) + fi/3N{C)]/ Q. 
Thus (O) is given by a simple arithmetic average, 

(0(/5,V^,/x)) = J— E 0{C) (7) 

"GCE C e Qgce 

Besides the microcanonical, canonical and grand canonical ensembles, we can con- 
struct other physical ensembles like isobaric-isothermal ensembles etc., depending on 
the problem under considerations. The choice of the ensemble is purely a matter of 
convenience dictated by the nature of the physical system under investigation and the 
nature of the macroscopic properties we want to calculate. Indeed we shall see later 
that even construction of unphysical ensembles like multicanonical ensemble proves to 
have certain distinct advantages. 

However, in these notes I shall concentrate on the calculation of a macroscopic 
property as an average over a canonical ensemble of microstates generated by the 
technique of Monte Carlo. 

2 What is a Monte Carlo method? 

Monte Carlo is a numerical technique that makes use of random numbers to solve a 
problem. Historically, the first large scale Monte Carlo work carried out dates back to 
the middle of the twentieth century. This work pertained to studies of neutron mul- 
tiplication, scattering, propagation and eventual absorption in a medium or leakage 
from it. Stanislav Ulam, John von Neumann and Enrico Fermi were the first to pro- 
pose and employ the Monte Carlo method as a viable numerical technique for solving 
practical problems. The earliest published work on Monte Carlo is perhaps the paper 
of Metropolis and Ulam 0] in the year 1949. ^ 

^There were of course several isolated and perhaps not fully developed instances earlier, when 
Monte Carlo has been used in some form or the other. An example is the experiment performed in 
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Thus, for carrying out a Monte Carlo simulation, we require a sequence of numbers 
which are random, independent, real and uniformly distributed in the range zero to 
one. Strictly, we can call a sequence of numbers random, if and only if it is generated 
by a random physical process like radioactive decay, thermal noise in electronic devices, 
cosmic ray arrival times, tossing of a coin etc. These phenomena are known to be truly 
random at least according to the present day theories. Generate once and for all, a 
fairly long sequence of random numbers from a random physical process. Employ this 
sequence in your Monte Carlo calculations. This is always a safe procedure. Indeed, 
tables of random numbers were generated and employed in early Monte Carlo practice. 
Census reports jTj, telephone directories [llj, spinning wheel [Ullll], automatic elec- 
tronic roulette radioactive decay thermal noise in a semi-conductor device'' 
etc., were employed to generate large tables of random digits. 

A table of random numbers is useful if the Monte Carlo calculation is carried out 
manually. However, for computer calculations, use of these tables is impractical. A 
computer, having a relatively small internal memory, cannot hold a large table of 
random numbers. One can keep the random numbers in an external storage device like 
a disk or a tape; but then, constant retrieval from these peripherals would considerably 
slow down the computations. Hence it is often desirable to generate a random number 
as and when required, employing simple arithmetic operations that do not take much 
computer time and the algorithm itself does not require much memory for storage. 
These numbers, generated by deterministic algorithms, are therefore predictable and 
reproducible. Hence by no stretch of imagination can they be called random. We shall 
call them pseudo random. We shall be content with pseudo random numbers if we find 
they go through tests of randomness satisfactorily. In fact, it is always a good practice 
to check if the sequence of random numbers you are going to use in your simulation, 
goes through several standard tests for randomness. It is also desirable to devise special 
tests of randomness, depending on the particular application you have in mind. I shall 
come to tests for randomness a bit later. 



the middle of the nineteenth century, consisting of throwing a needle randomly on a board notched 
with parallel lines and inferring the numerical value of tt from the number times the needle intersects 
a line; this is known as Buffon's needle problem, an early description of which can be found in The 
Quincunx constructed by Galton |2| in the second half of the nineteenth century, consisted of balls 
rolling down an array of pins and getting collected in the vertical compartments placed at the bottom. 
A pin deflects the rolling ball randomly and with equal probability to its left or right. The heights of the 
balls in the compartments approximate a binomial distribution. Perhaps this is an early technique of 
physically sampling random numbers from a binomial distribution. In nineteen twenties, Karl Pearson 
perceived the use of random numbers for solving complex problems in probability theory and statistics 
that were not amenable to exact analytical solutions. Pearson encouraged L. H. C. Tippet to produce 
a table of random numbers to help in such studies, and a book of random sampling numbers was 
published in the year 1927. 

In India, P. C. Mahalanbois jHj exploited ' random sampling ' technique to solve a variety problems 
like the choice of optimum sampling plans in survey work, choice of optimum size and shape of plots 
in experimental work etc., see [O]. 

Indeed, descriptions of several modern Monte Carlo techniques appear in a paper by Kelvin |rO] . 
written nearly more than hundred years ago, in the context of a discussion on the Boltzmann equation. 
But Kelvin was more interested in the results than in the technique, which to him was obviousl 

^M. Richter: millions of physical random numbers were generated by measuring thermal noise of 
a semiconductor device; these are available upon an anonymous ftp call to site at dfv.rwth-aachen.de 
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3 How do we generate a sequence of pseudo random 
numbers? 

3.1 Congruential Generators 

In a linear congruential generator, for example, we start with an integer Ri and generate 
successive integers by the recursion, 

Ri+i = a X Ri + b (mod m), (8) 

where a, b and m are integers, a is called the generator or multiplier, b the increment 
and m the modulus. If the increment b is zero, we call the generator multiplicative 
congruential and if 6 > 0, it is called the mixed congruential. The above equation 
means the following. Start with an integer Ri, between zero and m — 1. This is your 
choice. Calculate a x Ri + b. Divide the result by m and find the remainder. Call it 
i?2- Calculate a x R2 + b; divide the result by m; find the remainder and call it R^. 
Proceed in the same fashion and generate a sequence of integers {-Ri, R2, ■ ■ ■}, initiated 
by the seed Ri. Thus {0 < Ri < m — 1 : i = 1,2, ■■■}isa sequence of pseudo random 
integers. This sequence can be converted to floating point numbers by dividing each 
by m. 

A congruential generator is robust, simple and fast; the theory is reasonably well 
understood. It gives a fairly long sequence of 'good' pseudo random numbers. The 
values of a, b and m must be chosen carefully. Clearly, whatever be the choice of a, b 
and m, the sequence shall repeat itself after utmost m steps. The sequence is therefore 
periodic. Hence in applications, we must ensure that the number of random numbers 
required for any single simulation must be much less than the period. Usually m is 
taken to be very large to permit this. We can always get a sequence with full period, if 
m, b and a are properly chosen. ^ The modulus m is usually taken as 2*~^ — 1, where 
t is the number of bits used to store an integer and hence is machine specific. One of 
the t bits is used up for storing the sign of the integer. The choice a = 7^ = 16807, 
6 = and m = 2^^ — 1, for a 32 bit machine has been shown to yield good results jT3]. 
The period of this generator is 2^^ — 1 = 2.15 x 10^, which is really not long for several 
applications. Another popular choice of parameters |TH] consists of a = 69069, 6 = 1, 
and m = 2^^, which also has a short period of 4.29 x 10^. The choice of a = 13^^, 
6 = and m = 2^^ in the generator G05FAF of the NAG library |r3| has a much longer 
period of nearly 5.76 x 10^''. The congruential generators have been successful and 
invariably most of the present day pseudo random number generators are of this class. 

3.2 Marsaglia lattice structure 

Let ^i,^2;^3;''' denote a sequence of random numbers from a linear congruential 
generator. Form from these numbers, a sequence of rf-tuples: Vi = {(,1,^2, ■ ■ ■ ,(,d), 
V2 = (6,6, ■ ■ -.^d+i), ^^3 = (6,6,---,6+2)---- View the d-tuples {vi ,V2, vs,---} 
as points in the unit hyper cube of d dimensions. You would expect these points to 

®We must ensure that (i) m and b are relatively prime to each other; i.e. gcd{m,b) = 1; (ii) 
a = 1 (mod p) for every prime factor p of to; and (iii) a = 1 (mod 4), if to = (mod 4). 
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be more or less uniformly distributed in the hyper cube. But invariably you will find 
that these points are confined to a relatively smaller number of parallel planes, called 
the Marsaglia lattice structure named after its discoverer ^Hj. The points on several 
of the Marsaglia planes form regular patterns. Such a hidden Marsaglia order is an 
inherent feature of all linear congruential generators. The randomness is truly pseudo 
indeed! Hence you must check very carefully if the inevitable presence of Marasaglia 
defect in the pseudo random number sequence affects significantly the results of your 
Monte Carlo application. Hopefully it doesn't. But then one never knows. Somehow 
the Monte Carlo community has learnt to live with the Marsaglia defect over the last 
three decades! 

You can increase the number of Marsaglia planes by a second order recursion, 

Ri+2 = « X Ri+i + b X Ri (mod m), (9) 

and choosing properly the recursion parameters a and b. Note that the above recursion 
requires two seeds. 

3.3 Shift register generators 

An alternative that does not suffer from the correlations invariably present in the 
congruential generators is based on the generalized feedback shift register algorithms 
[in]. These are also called the Tausworthe generators 120]. Let Ri, R2 ■■■ denote a 
sequence of random binary integers. We define a recursion as follows, 

Ri = Ri_p + Ri_i (mod 2) q < p < i . (10) 

It is easily seen that the above addition of binary digits is the same as the exclusive-OR 
operation ( .XOR. ), which can be carried out in a computer rather fast: 

Ri = Ri-p -XOR. Ri-q . (11) 

For proper choice of values of p and q one can get a maximal period of 2^—1. A standard 
choice proposed by Kirkpatrick and StoU consists oi p = 250 and q = 103. This is 
called the R250 generator; it has a long period of 2^^° — 1 ^ 1.81 x lO''^. We need to 
provide 250 seeds for which we can employ another random number generator e.g. the 
linear congruential generator with a = 16807, 6=1 and m = 2^^ — 1 as recommended 
by Kirkpatrick and StoU |2I] . We shall see more about the R250 generator later on in 
section 14.31 

There are also several other alternatives that have been proposed. These include for 
example the inversive j|22j and explicit inversive j2Sl generators which employ nonlinear 
recursions. I do not want to go into the details of random number generation and 
instead refer you to [Mj for some additional literature on this important topic. 

4 How do we know that a given sequence of num- 
bers is random? 

This is not an easy question to answer, especially when we know that the sequence 
has been generated by a deterministic algorithm and hence is predictable and repro- 
ducible. We take a practical attitude and say that it suffices if we establish that the 
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pseudo random number sequence is indistinguishable statistically from a sequence of 
real random numbers. This brings us to tests for randomness. 

A randomness test, in a general sense, consists of constructing a function 
V' ('"i! ^2-, ■ ■ ■)) where ri. r2, ■ ■ ■, are independent variables. Calculate the value of this 
function for a sequence of pseudo random numbers by setting = V i = 1, 2, • • •. 
Compare this value with the value that -0 is expected to have if {r^ : i = 1, 2, • • •} were 
truly random and distributed independently and uniformly in the range zero to unity. 

4.1 Test based on calculation of the mean 

For example, the simplest test one can think of, is to set 

^ N 

V'(n,r2,---) = T7E^- (12) 

-'^ i=l 

which defines the average of numbers. For a sequence of N truly random numbers, 
we expect ip to lie between .5 — e & .5 + e with a certain probability p{e). Notice that for 
N large, from the Central Limit Theorem, is Gaussian with mean 0.5 and variance 
(j^ = (t'^/N, where = 1/12 is the variance of the random variable r, uniformly 
distributed in the interval (0, 1). If we take e = 2a^ = l/y/SN, then p(e) is the area 
under the Gaussian between .5 — e and .5+e and is equal to 0.9545. e is called two-sigma 
confidence interval. I shall discuss in details the issue of confidence interval / statistical 
error very soon. Thus for a sequence of truly random numbers, we expect its mean 
to be within ±e around 0.5 with .95 probability, for large A^. If a sequence of A^ pseudo 
random numbers has an average that falls outside the interval (0.5 — e, 0.5 + e) then 
we say that it fails the test at 5% level. 

4.2 Run test 

In practice, one employs more sophisticated tests, by making ip a complicated function 
of its arguments. An illustrative example is the run test which is sensitive to the 
correlations. The idea is to calculate the length of a run of increasing (run-up) or 
decreasing (run-down) size. We say a run-down length is / if we have a sequence of 
random numbers such that > < < ^/_2 < ■ ■ ■ < ^2 < Ci- We can similarly 
define the run-up length. 

Let me illustrate the meaning of run-down length by considering a sequence of 
integers between and 9, given below. 

6413 45 327 48 
|6 4 1 3|4 5|3 2 7|4 8| 

I 3 I 1 I 2 I 1 I 

The first row displays the sequence; the second depicts the same sequence with numbers 
separated into groups by vertical lines. Each group contains n + 1 numbers, the first n 
of which are in descending order, i.e. these numbers are running down. The descent is 
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broken by the [n + 1)*^ number which is greater than the n^^ number. The third row 
gives the run-down length, for each group, which is n. 

We can calculate the probability distribution of the run-down length as follows. Let 
P{L > I) be the probability that the run-down length L is greater than or equal to 
/. To calculate P{L > I) we consider a sequence of / distinct random numbers. There 
are /! ways of arranging these numbers. For a sequence of truly random numbers, 
each of these arrangements is equally likely. Of these, there is only one arrangement 
which has all the I numbers in the descending order. Hence P{L > I) = Since 
P(L = l) = P{L >l) - P{L>1 + 1), we get 

Alternately, for the probability of run-down length we have, 

/■I rii-1 
P{L>1) = diJ d^2--- / d^i 
Jo Jo Jo 



1 
n 



(14) 



In the test, we determine numerically, the distribution of the run length on the se- 
quence of pseudo random numbers and compare it with the exact distribution given 
by Eq. (fT^. Figure ((T)) depicts the results of a run-down test. 



4.3 The rise and fall of R250 

Generation and testing of random numbers is a specialized area of research. Several 
important pieces of work have been carried out in these areas. The field remains active 
even today. The reason for this is simple: There exists a large number of questions that 
remain unanswered. In fact, even the very basic issue of what is meant by randomness 
of a sequence of numbers, is not clearly understood. I shall not discuss these and 
related issues here and instead, refer you to |2S1 for some interesting literature on this 
intriguing subject. For some literature on testing of randomness of a pseudo random 
number sequence see [21]. The use of pseudo random numbers in a Monte Carlo 
simulation is perhaps all right for most of the application; in any case we have no other 
choice; it is always prudent to be watchful; the subtle correlations and patterns present 
in a sequence of pseudo random numbers may have non trivial effects on the results of 
your simulation. 

The (hi) story of the rise and fall of R250 generator should warn us of how delicate 
the issue of pseudo random number generation is. I have already talked about the gener- 
alized feedback shift register algorithm in section EISl R250 is an early implementation 
of the shift register algorithm |^ • It "was introduced in the early nineteen eighties 
and became an instant hit amongst the Monte Carlo practitioners. Troubles started 
in the early nineteen nineties. In the year 1992 Ferrenberg, Landau and Wong 
reported severe problems with the generator in the Monte Carlo simulation of a two 
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3 4 
run-up length, I 



Figure 1: Results of a run-down test. The histogram plot corresponds to what is expected 
from a sequence of truly random numbers. See Eq. pHj) . The points with error bars cor- 
respond to the ones obtained with random number generator routine RAND of MATLAB 
version 4.1. The run-down lengths were generated from a sequence of 10,000 numbers. The 
statistical errors in the calculated values correspond to one-sigma confidence interval. 
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dimensional Ising model employing Wolff cluster algorithm. ^ The calculated values 
of energy and specific heat were erroneous; however, the same algorithm gave correct 
results when other random number generators were employed. These startling findings 
were confirmed by Coddington [2H] in the year 1994. The culprit was eventually traced 
to the triplet correlations in R250 in particular [211 and in the shift register algorithms 
for certain specific choices of the parameters, in general jSUI. See also [31] for more 
literature on how the correlations present in the random numbers would affect the re- 
sults of the simulations. Perhaps one is tempted to agree with Compagner jHSI when, 
tongue in cheek, he says that " random sequences are easy to generate in a computer 
except when needed for reliable Monte Carlo simulations " ! But as I said earlier, we 
have no other choice; we have to depend on deterministic algorithms for generating 
random numbers. The lesson from the above story of the R250 generator is that do 
not use your 'pet' recursions or algorithms for generating random numbers for your 
applications. Follow the advise of Janke [S3]: Use only well known generators which 
have been tested and which are being employed by a large number of Monte Carlo 
practitioners. If such a generator poses problems in your specific application, then you 
can at least be assured that the Monte Carlo community would work hard to track 
down the problem and perhaps resolve it, like it did in the case of R250 generator. 



5 How do we sample from a given distribution? 

5.1 Analytical inversion and sampling from exponential 

Let {^i : i = 1, 2, ■ ■ ■ } denotes the sequence of pseudo random numbers. These are 
independent, real and uniformly distributed in the range (0,1). From {^j}, we can 
construct a sequence of independent random numbers, {xj} having the desired distri- 
bution, say f{x). Let F{x) denote the cumulative distribution defined as, 

F{x) = r f{x')dx' . (15) 

J —oo 

Then {xi = F~^(^j) : i = 1, 2, ■ ■ ■} constitute an ensemble of real numbers, whose 
distribution is the desired f{x). For example, 

{x, = -Aln(l-e.): z = l, 2, ■■■} 
are independent random numbers distributed as per the exponential distribution, 

{A~^exp(— x/A) for X > , 
(16) 
for X < . 

5.2 Fernandez-Rivero technique for sampling from exponen- 
tial 

To sample from an exponential distribution, Fernandez and Rivero [HI] proposed a 
simple algorithm inspired by statistical physics. Start with particles indexed by 

^We shall discuss the Ising model and the Wolff cluster algorithm sometime later; see sections |H1 
andEEU 
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integers i = 1, 2, ■ ■ ■ A^. Initialize x{k) = A V A;. The particles 'interact' in the 
following fashion. Select independently and randomly two particles, say i and j with 
i 7^ j. Let S = x{i) + x{j)] split S randomly into two parts, assign them to the particles 
i and j and return. Repeat the above several times until the system equilibrates. 
Some AN iterations are recommended for equilibration, see After the system has 
equilibrated, every time you select two particles, say / and k and / 7^ k, for interaction, 
you have x{k) and x{l) as two independent random numbers distributed exponentially. 
For more details see [Hlj . 



5.3 Sampling from Gaussian 
5.3.1 Fernandez-Criado technique 

In the same spirit as in section (|5.2|) . set x^^\i) = 1 \/ i and let the randomly chosen 
particles i and j with i ^ j interact in the following fashion, 

where the superscript is the iteration index and {ry('^) ; k = 0, 1, ■ ■ ■} are identically 
distributed independent random variables taking values ±1 with equal probabilities. 
The 'equilibrium' distribution of x is Gaussian with mean zero and variance unity, 
see 



5.3.2 Box-Miiller algorithm 



There is another ingenious way of generating Gaussian random numbers called the 
Box-Miiller method The Gaussian distribution of zero mean and unit variance is 
given by. 



1 



exp - — 



/2tt ^ V 2 

The corresponding cumulative distribution is given by 



;i8) 



F(x) 



dyexp 



r 
2 



X 



1 + erf ( - ^ 



(19) 



where erf(-) denotes the error function, which is not analytically invertible. Hence 
we consider a two dimensional distribution obtained as the product of two identical 
Gaussians of zero mean and unit variance given by. 



f2{x,y) = fi{x)fi{y) = —exp 



(20) 
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Let us express the above in terms of polar coordinates r and 9 with the transformations 
defined by: x = r cos{6) and y = r sin(6'). We get 

/2(x, y)dxdy = exp (-^) rdr^ , (21) 

which suggests that the angle 9 is distributed uniformly between and 27r and r can 
be sampled by analytical inversion since, 

rr 

F(r) = / driTi exp 
Jo 

= l-exp(-rV2) (22) 

Thus we get the Box-Miiller sampling: from two random numbers and .^2, indepen- 
dently and uniformly distributed in the unit interval (0, 1) we get two independent 
Gaussian random numbers, 

qi = ^J-2\n^i cos(27r^2) 

q2 = v^^2k^sin(27re2). (23) 
For Gaussian there is an algorithm directly based on the Central Limit Theorem. 




5.3.3 Algorithm based on the Central Limit Theorem 

According to the Central Limit Theorem, the sum of N independent random variables 
each with finite variance, tends to a Gaussian in the limit N —>■ 00. Let y be the sum of 
N random numbers (independently sampled from uniform distribution in the interval 
to 1). The distribution of y tends to a Gaussian with mean N/2 and variance A/12 
as A ^ CX3. Define 




The random variable x has mean zero and unit variance. The value of x is confined 
between —\/3N and +\/3N and hence x tends to a Gaussian strictly in the limit of 
A — i> 00. But as shown in [HZI, even a convenient choice of A = 12 gives good Gaussian 
random numbers with errors not exceeding one percent or so inside a two sigma interval 
(-2, +2). 

A very large number of transformations, tricks and algorithms have been discovered 
for random sampling from non uniform distributions. In fact this activity continues 
to be a favorite pastime of the Monte Carlo practitioners. Standard texts on Monte 
Carlo methods pHll^EUl l^ contain detailed descriptions of several random sampling 
techniques. 
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6 How do we evaluate an integral by Monte Carlo 
method? 

For purpose of illustrating the technique of Monte Carlo, let us consider a simple 
problem of evaluating the following integral, 

b 

<l>{x)dx. (25) 

In a finite difference approximation, we divide the range (a, b) of x into N equal inter- 
vals. Let Xi denote the mid-point of the i^^ interval. The integral is approximated by 
the sum. 



In 



E'^(^^)' (26) 



/ 1\ b — a , , 

X^ = 2) ^^^^ 

In the above, instead of choosing Xj at regular intervals, we can choose them randomly 
and independently from a uniform distribution in the interval a to b. In other words, 
we set, 

Xi = a+{b-a) = 2, ■■■ , N (28) 

where {^j} are independent random numbers uniformly distributed in the range (0, 1), 
and carry out the sum in Eq. (j^Uj) . Let In denote the Monte Carlo estimate of the 
integral, obtained from a sample of size A^. It is intuitively clear that as ^ cxd the 
estimate In I. How large should be so that /at is a reliable estimate of / ? Before 
we answer this question let us consider the evaluation of the mean of a function h{x) 
of a random variable X. 
Formally we have, 

/ + CO 
h{x)f{x)dx, (29) 
-00 

where f{x) is the probability density function of the random variable X. h{x) is usually 
called the score function. How do we estimate (h) j ? Let us first consider the so called 
analogue simulation. We sample randomly a sequence of {xi : i = 1,2, ■ ■ ■ , N}, from 
the density /(x) and write, 

1 ^ 

hN = ^T.h{^^)- (30) 

i=i 

In the limit A^ — > 00, {f)h- Also by the Central Limit Theorem, in the limit 

N ^ 00, the probability density of the random variable tends to a Gaussian with 
mean {h)f and variance cr'^/N, where 



h{x) - {h)j\ f{x)dx. (31) 
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Thus we say that analogue Monte Carlo estimate of is given by hj\i±a/\/N, where 
±cr / \/N defines the one-sigma confidence interval. This means that with a probability 
p given by, 



P 



N r{h)f+(a/^/N) 



a\j2TX J{h)j-{cr/^m) 



exp 



2a2 



dx 



1 



+1 



exp 



y_ 

' 2 



dy 



0.68268 



(32) 



we expect hj^ to lie within ia/y/N around {h)f, if is sufficiently large. First we 
notice that we do not know cr. Hence we approximate it by its Monte Carlo estimate 
Sn, given by, 



N 



1 

N 



N 



1 



N 

1=1 



(33) 



The quantity ±Sn/vN is called the statistical error. Notice that the sample size N 
must be large for the above estimate of the error (and of course of the mean) to be 
reliable. 

Now, returning to the question of the reliability of Jtv as an estimate of the integral I 
given by Eq. (j2S)), we immediately see that the associated statistical error is ±SN/y/N, 



where S% 



Jn-[I 



N 



and 



N 



J, 



N 



N 



(34) 



The statistical error is independent of the dimensionality of the integral. The error 
in a deterministic algorithm, on the other hand, is proportional to A^"^/*^, where d is 
the dimensionality of the integral. Thus, Monte Carlo method becomes numerically 
advantageous, for o? > 3. 

It is clear from the above that the statistical error is directly proportional to a and 
inversely proportional to y/N. The computer time however is directly proportional to 
A^. If the problem on hand has inherently large a, then for calculating averages within 
desired (small) statistical error bars, we shall need a very large sample of microstates; 
generating a large sample is often not possible within meaningful computer time. Then 
analogue simulation is going to be extremely difficult and most often impossible. We 
need to resort to techniques that reduce the variance without in any way changing the 
averages of the desired quantities. These are called variance reduction techniques and 
in what follows I shall describe one of them called importance sampling. 



7 What is the basic principle of importance sam- 
phng? 

Importance sampling helps us sample from the important regions of the sample space. 
Consider the problem described in section IHl see Eq. (j29|) . Let f{x) be high where 
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the score h{x) is low and f{x) be low where h{x) is high. Then, high score regions of 
X shall get very poorly sampled. A finite sample Monte Carlo estimate of (h) / would 
suffer from poor statistics and would often be erroneous. This is a typical situation 
whence importance sampling becomes imperative. 

Let g{x) denote an importance density. We decide to sample randomly and in- 
dependently from the importance density instead of f{x). Define a modified score 
function H{x) as 

H(x) = h(x)^. (35) 

The expectation of H{x) over the importance density g{x) is identically equal to the 
expectation of h{x) over f{x): 



/+0O 
H{x)g{x)dx 
-oo 



h{x)^g{x)dx 
J-oo g[x) 

h{x) f {x)dx 



{h)f. 



(36) 



Thus we sample VLg = {xi : i = 1,2, ■ ■ ■ , N} randomly and independently from the 
importance density g{x). For each Xi e fig, calculate the unweighting - reweighting 
factor f{xi)/g{xi). Monte Carlo estimate of {H)g is given by, 



1 ^ 1 f(x] 



N 



Ar-»oo 



(37) 



Notice that as — > oo, — > {H)g = (h) /. 

Let us now calculate the statistical error associated with i^jy. It is adequate if we 
consider the second moment, since we have formally shown that the average value of 
h over /- ensemble is identically equal to the average of H over ^f-ensemble. In other 
words the mean is preserved under importance sampling. The second moment of the 
H, evaluated over the g^-ensemble is given by. 



Mi{H) 



/-too 
H [x)g{x)dx , 
-oo 

/■+°° h{x)f{x) h{x)f{x) 

J—oo 



g{x)dx , 



g{x) g{x) 

h'^{x)f{x)dx . 

On the other hand the second moment of h over the /-ensemble is given by. 



r+oo 




J—oo 





(38) 



h'^{x)f{x)dx. 



(39) 
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We find M^(if) 7^ M^{h). If we choose the importance density function g{x) properly, 
we can make M^(if) to be much less than M^{h). In other words we can get substan- 
tial variance reduction under importance sampling. The averages estimated employing 
importance sampling will be statistically more reliable than the ones calculated by 
sampling from f{x). This in essence is the principle of importance sampling. 



7.1 An illustration of importance sampling 

A simple problem would drive home this important point. Let f{x) = exp(— x) defined 
for X > 0. Consider a score function defined as, 

{0 for X < T 
(40) 
1 for X >T . 

These two functions are depicted in Fig. (j2t) for T = 5. Notice that the score function 
h{x) is zero for x < T - a. region of x where the probability is very high. Hence most 
of the values of x sampled from the density f{x) will all be in this region and the 
corresponding scores are zero. On the other hand, the score function h{x) is unity 
for X > T - a region of x where the probability is negligibly small. This high score 
region is going to be scarcely sampled; often this region is almost never represented in 
a finite sample. Let us consider an importance density g{b,x) = bexp{—bx) defined 
for X > and < 6 < 1 where 6 is a parameter to be optimized for minimum 
variance. Fig. depicts the importance density and the modified score function 
H{x) = h{x)f{x)/g{b, x) for T = 5 and 6 = 6 = 0.18. For this problem all the statistics 




Figure 2: (a) The probability density f{x) and the score function h{x) for T = 5. The score 
function h{x) is zero for x < 5, where the probability is high; h{x) = 1 for x > 5 where the 
probability is negligibly small, (b) The importance density function g(b, x) and the modified 
score function H{x) = h{x)f{x)/g{b,x), where b = 0.18 . 
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can be calculated analytically. The mean and variance of h over the /-ensemble are 
given by, 

{h)f = cxp(-T) 
a\h) = {h)f{l-{h)f) (41) 

Table (1) gives the mean and the relative standard deviation of h over the /-ensemble 
for several values of T. The last column gives the size of the sample required in a 
Monte Carlo simulation, to estimate the mean within ±10% statistical error by random 
sampling from the /-ensemble. We see from Table (1) that as T increases (/i)^ decreases 

Table 1: /-ensemble: exact analytical results 



T 




a{h)/ {h)f 


N ^ima'^{h)/ {h)] 


3 


4.98 X 10-2 


4.37 


1.91 X 10^ 


5 


6.74 X 10-3 


1.21 X 10^ 


1.47 X 10^ 


10 


4.54 X 10-^ 


1.48 X 10^ 


2.20 X 10^ 


20 


2.06 X 10-9 


2.20 X 10^ 


4.85 X 10^° 



and the relative fluctuation increases. We flnd that the sample size required to predict 
the mean within ±10% statistical error is over 48 billion for T = 20, a task which is 
nearly impossible on any computer. 

Let us see how the use of importance sampling renders possible the impossible. 
The mean of H over the gi-ensemble is identically equal to the mean of h over the 
/-ensemble. However the variance of H over gi-ensemble is different and is given by, 

"'^(^•"'-S-'""- '''' 

It is now quite straight forward to calculate the value olh = h for which the variance 
is minimum. We get, 

^ = l + ^-/l+^- (43) 

Table (2) presents T, 6, cr(i7, h = h)/ {H)^ and the sample size required to estimate the 
mean within ±10% statistical error under importance sampling Monte Carlo. We find 
from Table (2) that use of importance sampling would lead to a considerable reduction 
of variance especially for large T. Take the case with T = 5. Use of importance 
sampling would reduce the statistical error by a factor five or so. As a consequence a 
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sample of size 650 is adequate to estimate {H) ^ = (h) j whereas from the /-ensemble 
we would require to sample 14, 700 realizations to estimate the average within ±10% 
statistical error. The results for T = 20 arc more dramatic and bring home the need 
and power of importance sampling. The statistical error gets reduced by a factor of 
4247 when we employ importance sampling. We need only a sample of size 2687 from 
the g'-ensemble to estimate the mean within ±10% of the exact, as compared to the 
sample of size 48 billion from the /-ensemble. 

Table 2: Importance sampling: exact analytical results 



T 


b 




N^ma'(H,b)/{H)l 


3 


.28 


1.95 


381 


5 


.18 


2.55 


651 


10 


.09 


3.65 


1329 


20 


.048 


5.18 


2687 



Let us see below if the above conclusions based on analytical theories are borne 
out by Monte Carlo simulations. We have sampled explicitly 10, 000 realizations from 
the importance density 6exp(— 6a;), employing analytical inversion technique, described 
earlier, and calculate {H)^ as. 



TV 



N 



N 



Y,H{xi) ^ {H)gandN = 10,000, 



(44) 



where. 



1=1 



■ exp 



Xi 



0, 



, if > T, 
if Xi < T. 



(45) 



The statistical error is calculated as. 



N 



± 



Vn\ n 



N 



AT 



- ff2 



(46) 



Table (3) gives the estimated mean Hn, the relative statistical error, and the actual 
deviation of the Monte Carlo estimate from the exact value (/i) ^ . We observe from Ta- 
ble (3) that we are able to make a very good estimate of the desired average employing 
importance sampling. The corresponding results obtained by random sampling from 
the density f{x) of 50, 000 realizations (five times more than what we have considered 
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Table 3: Results of Monte Carlo sampling of 10, 000 realizations from the g-ensemble 



T 


Hn 


X 100 


^^""^^ X 100 


3 
5 

10 
20 


4.94 X 10-2 
6.76 X 10-^ 
4.52 X 10-5 
1.96 X 10-9 


±2.0% 
±2.6% 
±3.7% 
±5.4% 


-0.8% 
±0.3% 
-0.4% 
-4.9% 



Table 4: Results of Monte Carlo sampling of 50, 000 realizations from the / ensemble 



T 




t/',^ X 100 


X 100 


3 


5.03 X 10-2 


±1.9% 


±1.8% 


5 


6.68 X 10-3 


±5.5% 


-0.9% 


10 


6.16 X 10-^ 


±57.7% 


±35.7% 


20 









for sampling from the ^f-ensemble) are given in Table (4). We find from Table (4) that 
Monte carlo simulation based on sampling from the /-ensemble to estimate (h) j- with 
T = 20 is impossible. On the average, we can expect one in 49 million realizations 
sampled from the exponential density, to have a value greater than 20. The chance of 
getting a score in a simulation of 50, 000 histories is practically nil. 

The key point is that importance sampling helps you pick up rare events - events 
that have very low probability of occurrence. We need these rare events in our Monte 
Carlo sample because they have high scores; i.e. we are interested in investigating a 
phenomenon that concerns these rare events. For example we would like to have in 
our Monte Carlo sample the rare neutrons from a source that manage to penetrate a 
very thick shield and enter the detector kept on the other side. We shall employ, for 
example, an importance sampling scheme based on exponential biasing [^I43j . In the 
problem of Monte Carlo calculation of averages over canonical ensemble, we would like 
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to sample microstates with high Boltzmann weight. We shall employ the Metropolis 
importance sampling [IHl- In the study of first order phase transition, we would be 
interested in sampling the rare microstates that describe interfaces between the ordered 
and disordered phases. We would resort to multicanonical sampling jHlj. In fact we 
can say that without importance sampling techniques, Monte Carlo method is of rather 
very little practical use for simulation of a large variety of problems, be it in statistical 
physics or in radiation transport. 

With these preliminaries let us now turn our attention to sampling from a canonical 
ensemble of microstates of a macroscopic system, employing Monte Carlo. I shall 
illustrate this taking ferromagnetism as an example. We shall consider Ising spin 
model of magnetism. 

8 What is an Ising spin model? 

Unpaired electron spins couple and align. The sum of such tiny magnetic moments 
results in macroscopic magnetism. In the year 1923, Prof. Lenz proposed a very simple 
model of magnetism to his student, Ising, who solved it analytically in one dimension. 
Ever since, this model is called the Ising model. For an historical introduction to Ising 
model, see [Hj. In the Ising model, a spin has only two states: an up state and a 
down state. Let us denote the spin variable by the symbol Si, and in the model, we 
say that Si can take only values of +1, denoting the up state (t) or —1, denoting the 
down state (|). We organize the spins on a lattice and the index i refers to the lattice 
site. The lattice can be in general d dimensional. Let i and j refer to two nearest 
neighbour sites on the lattice and let Si and 5*^ be the spins on these lattice sites. 
Energy associated with the pair of nearest neighbour spins is given by ejj = —JSiSj. 
Thus when the two spins are aligned (up or down), the energy associated with them is 
—J; when not, the energy associated with the pair is + J. The value of J measures the 
strength of the spin-spin interaction. If J is positive, the interaction is ferromagnetic. 
If J is negative, the interaction is anti-ferromagnetic. We consider J > 0. Figure Q 
depicts two-spins configurations. There are a total of four possible configurations. The 
two configurations marked (a) are each of energy — J, since the spins are aligned. The 
two configurations marked (b) are each of energy +J since the spins are not aligned. 
Consider now Ising spins on a lattice. Energy associated with a spin configuration C 
is given by, 

EiC) = -JY.S,iC)S,iC) , (47) 

{id) 

where, the symbol denotes that the sites i and j are nearest neighbours. The sum 
is taken over all pairs of nearest neighbour sites in the lattice. We shall concentrate 
on ferromagnetism and hence J is positive. The energy is lowest when all the spins 
are aligned, either up or down. The Ising Hamiltonian, see Eq. (j47|) . remains the same 
if all the spins are flipped. An external magnetic field that couples to all the spins, if 
present, would break this symmetry of the Hamiltonian. Let us consider an external 
field in the up{^) direction. The external field couples to each of the Ising spin in the 
system with a strength B. The energy of the system when in the configuration C is 
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Figure 3: Two-spins configurations: (a) There are two aligned spin configurations. The 



energy of each configuration is e 



-J Si Sj = — J. (b) There are two configurations in each 



of which the spins are not ahgned. The energy of each configuration is e 



J Si Sj — ~\~ J 



given by, 



E{C) = -JY.S,iC)S,iC)-BY^S,iC) , (48) 

{iJ) i 

When the temperature is lowered below a critical value, even in the absence of an 
external field {B = 0), the symmetry is broken. This is called spontaneous symmetry 
breaking. 

It is clear that Ising spin system at very low temperature, ksT << J, will have 
low energy, aligned spins and large magnetization. On the other hand, at very high 
temperature, the spin system will have high energy, randomly oriented spins and hence 
no net macroscopic magnetization. The system transforms from a disordered param- 
agnetic to an ordered ferromagnetic phase, when the temperature is lowered below a 
critical value called the Curie temperature. 

Ising solved the model in one dimension analytically and showed that there is no 
phase transition: Magnetization M(T) decreases slowly and continuously as T in- 
creases. The susceptibility x = dM/dT is finite at all temperature. There is no 
divergence, either of the specific heat or of the susceptibility. But, under a mean field 
approximation, the Ising model exhibits phase transition at T = Tc = rjJ/k-Q, where 
rj is the coordination number of the lattice. This raised a serious doubt whether 
the statistical mechanics machinery can at all describe the phenomenon of phase tran- 
sition. After all, one can argue that the phase transition predicted is an artifact of 
the mean field approximation. This serious dilemma was settled, once and for all, by 
Onsager 03] • He solved the two dimensional Ising model exactly; it shows phase tran- 
sition just like real magnets, at finite temperature. Onsager showed that the transition 
temperature for two dimensional Ising system is given by, k-oTf./ J = 2/ln(l + a/2). 



The coordination number 77 is 2 for one dimensional lattice; 4 for two dimensional square lattice; 
6 for two dimensional hexagonal lattice; 6 for three dimensional cubic lattice; etc. Under mean field, 
the critical temperature does not depend on the dimensionality; it depends only on i] - the number of 
nearest neighbour interactions and J - the strength of interaction. 
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9 How do we simulate an Ising spin system? 

Let P{C) denote the probability with which a microstate C (of an Ising spin system) 
occurs in a closed system at temperature T = l/(fcB/3). It is given by, 

P{C) = ^^exp[-(3E{C)] , (49) 
where the normalization, known as the canonical partition function, is given by 

Z{(3)=J2exp[-f3E{C)] , (50) 
c 

and E{C) is the energy of the microstate C. We say that a Boltzmann weight given by 
exp[—PE{C)] is associated with the microstate C. Let V denote the total number of 
spins in the system. It is readily seen that the number of possible spin configurations 
(microstates) is 2^. Let us consider a simple minded approach to simulate the system; 
we shall call this analogue Monte Carlo. It consists of sampling microstates randomly, 
independently and with equal probabilities. Assemble an ensemble of N equi-probable 
microstates; carry out (Boltzmann) weighted average of the macroscopic quantity, say 
magnetization M, over the ensemble as given below. 



um.jj EiIiM(C.)exp[-/3E(C.)] 
Ei=iexp[-/3E(Ci'' 



A real magnetic system would contain an extremely large number of spins; i. e. V shall 
be of the order of 10^^. Let us consider a modest model system with some hundred 
spins on a two dimensional 10 x 10 square lattice. The number of spin configurations is 
2100 ^ 10^°. Even if we assume optimistically that it takes a nano second to generate 
a spin configuration, the total time required to sample all the spin configurations is 
nearly of the order of thirty thousand billion years! In a Monte Carlo simulation, 
we sample only a very small number of spin configurations, say ten thousand or so; 
with modern high speed computers it may be possible to generate say a few billion 
microstates. Unfortunately, because of the Boltzmann weight exp[—PE{C)], most of 
the spin configurations randomly generated would contribute very negligibly to the 
sum. The associated fluctuations shall be very large. Hence we resort to importance 
sampling. 

In importance sampling, the idea is to select spin configurations randomly and in- 
dependently from the distribution P{C) = exp[—PE{C)]/Z. These spin configurations, 
when sampled adequately large in number, constitute, to a very good approximation, 
a canonical ensemble. Therefore average of a macroscopic property can be calculated 
as a simple arithmetic mean over the sampled microstates. But to carry out this task 
we need to know the partition function, Z{T). But then Z is precisely what we want 
to calculate in the first place. Thus there is a catch. But then fortunately there is a 
way out, as shown by Metropolis and his co-workers [IB] . 

Construct a Markov chain of configurations, starting from an arbitrary initial con- 
figuration Cq. Let Co ^ Ci ^ • ■ • ^ C„ ^ C„+i ^ ■ ■ ■ represent a Markov chain of 
spin configurations. The index n can be viewed as denoting time. If we establish that, 
for large n, the set {C^+i, C„+2, ■ ■ ■} constitutes an equilibrium canonical ensemble, we 
have done the job. 
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10 What is a Markov chain? 

A Markov chain is a discrete time stochastic process. Consider the Ising spin sys- 
tem which can be described at any discrete time n as being in any one of the mi- 
crostates (spin configurations) belonging to the closed system at given temperature; 
the microstates are denoted by the script alphabet: {Ci, C2,---C^^^}. The set of 
all microstates are denoted by fics and the number of microstates belonging to the 
closed system is denoted by fics- Let C„ be the microstate of the system at the 
discrete time n. Cn is random and can be any one of the microstates belonging to 
^cs = {^1,^2, ■ ■ 'Cq^^} with certain probabilities. If the system is in equilibrium the 
probability for to be Ci is given by P{Cn = Ci) = exp[—PE{Ci)]/Z, and is indepen- 
dent of the time index n. 

The system present in microstate C„ G flcs at time n makes a transition to mi- 
crostate C„+i e flcs at time n + 1, according to certain probabilistic rules. The discrete 
time stochastic evolution of the system starting from an initial state Cq G Qqs at time 
zero until time n is completely specified by the joint probability P(Co, Ci, ■ ■ ■ C„); this 
joint probability can be expressed in terms of conditional probabilities see page 33 
and 34 of A. Papoulis listed in P as, 

P{Cn, C„_i, ■ • ■ Ci, Co) = P(C„|C„_i, Cn-2, ■ ■ ■ Ci, Co) X 

Cn-2) Cn-3, ■ ■ ■ , Ci, Co) X 
-P(C„_2|C„_3, C„_4, ■ ■ ■ , Ci, Co) X 
X 

P(C2|Ci,Co) X 
P(Ci|Co) X 

P{Co) (52) 

The sequence of random variables Co, Ci, • ■ ■ C„ constitutes a Markov chain, if 

P{Ck\Ck-u Ck-2 ■ • • Ci, Co) = P(Cfc|Cfc_i) \/k = l,n. (53) 

As a consequence we have 

P(C„,C„„i,---Ci,Co) = P(Co)P(Ci|Co)P(C2|Ci)---P(C,|Cn-i) . (54) 

Physically the Markovian assumption implies that the microstate the system is going 
to visit at time n+1 depends only on the state it is present now at time n and not 
where it was at all the previous times. The past has no influence over the future 
once the present is specified. We restrict ourselves to a time invariant or also called a 
time homogeneous Markov chain for which, regardless of the time index k, we have, 

PiCk+i = Ci\Ck = Cj) = Wij Wk (55) 

^^Let B be an event with non-zero probability, P{B) > 0. Consider an event A. We define 
the conditional probability of A assuming B has occurred as P{A\B) — P{A n B)/P{B). In words, 
P{A\B) equals the probability of that part of A included in B divided by the probability of B 

^^The Markov process is the stochastic equivalence of the familiar Newtonian dynamics 
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where Wij is the probability for the transition from microstate Cj to microstate 
in a single step. Thus Wij is the i,j—th element of the fics x ^cs square matrix. 
The stochastic matrix W completely specifies the stochastic dynamical evolution of 
the system given the initial state Co at time n = 

Let P{Ci,n) denote the probability that the Ising spin system is in a configuration 
Ci at time n. In other words P{Ci,n) is the probability that C„ = Cj. Formally we 
have the discrete time Master equation, 



n 



1) = Y.W^,P{C, 




Y: W.,PiC,,n)-W,,P{C. 



n] 



P{Cun) 



(56) 



We need 



P(Ci, n + l) = P{Ci, n) = 7i{Ci) =ni W I 



(57) 



when {n oo), for asymptotic equilibrium. In the above 7r(Cj) = vTj is the equilibrium 
probability of the microstate Ci. i.e. tTj = exp[—/3E{Ci)]/Z. We can ensure evolution 
to asymptotic equilibrium by demanding that each term in the sum over j in the 
RHS of Eq. (jSUjl be zero. Notice that this condition is sufficient but not necessary for 
equilibration. This is called the detailed balance condition: 



which implies that 



W, 



— = exp 

Hi 



P{E{Q-EiCj)} 



exp [-pAE] 



{51 



(59) 



The important point is that only the ratios of the equilibrium probabilities appear in the 
above. These ratios are known; they are just the ratios of the Boltzmann weights; we 
do not need to know the normalization (partition function) for constructing a transition 
matrix, see discussions toward the end of section Q. 

A task now is to construct an algorithm that takes the system from one microstate 
to the next as per a transition matrix whose elements obey Eq. (j59p . We have already 
seen that the number of microstates of a closed system is fics = 10^° even for a small 
system of Ising spins on a 10 x 10 square lattice; the W matrix for this problem will 
thus contain some 10^° elements! Constructing explicitly such a matrix, storing it, and 
carrying out operations with it are neither feasible nor required. What we need is an 
algorithm that takes the system from one microstate to another which in effect mimics 
the transition induced by the matrix W. The Metropolis algorithm does this. 

11 What is the Metropohs Algorithm? 

The Metropolis algorithm defines a stochastic dynamics which generates, starting from 
an arbitrary initial microstate Co, a Markov chain of microstates, given by. 



Co ^ Ci ^ C2 



Cj. — > C 



fe+1 



(60) 
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Let us see how does the transition: Ck Ck+i, take place as per Metropolis algorithm. 
We select one of the spins randomly and with equal probability (= l/V); let Si denote 
the selected spin which is at lattice site i. We flip the selected spin {Si —Si) and 
get a trial conflguration denoted by Ct- This step constitutes the selection part of the 
transition matrix W. Let AE = E{Ct) — E{Ck)- We accept Ct as the next microstate 
in the Markov chain with a probability given by, 

p = mm 1 



mm 



{I,exp[-PAE] ) . (61) 



In other words, 



Ct with probability p 
Cfc+i = { (62) 
Ci with probability 1 — p 

This step constitutes the acceptance / rejection part of the transition W. Executing 
these two steps constitutes one Monte Carlo Step (MCS). It is easily verifled that the 
Metropolis transition described above obeys the detailed balance condition. 

Let me recapitulate the different steps involved in the practical implementation of 
the Metropolis algorithm.: 

• Start with an arbitrary initial microstate Cq G Qqs and evolve it : Cq ^ Ci ^ 

C2---. 

• select randomly and with equal probability one of the spins in the current con- 
flguration, Cfc. Let Si denote the selected spin. Flip the spin (5*^ —Si) and get 
the trial conflguration C^. Calculate AE = E{Ct) — E{Ck). 



If AE < 0, accept the trial state; C^+i = C 
if AE' > draw a random number ^ 

if ^ < exp{—pAE), accept the trial state; C^+i = C 



o if ^ > exp{—pAE) reject the trial state; C^+i 



'i 

■k 



= C, 13 



Iterating the above we get a Markov chain of microstates. In a single Monte Carlo 
Step (MCS) the system switches from the current state C^ to the next state C^+i. 
A consecutive set of V number of MCS constitutes one Monte Carlo Step per Spin 
(MCSS). The time scale for the Metropolis dynamics is set by an MCSS. We construct 
an ensemble of microstates from the Markov Chain by picking up microstates at the 
end of every MCSS: {Co, Cy, C2y, ■ ■ ■}. Discard conflgurations at the beginning of 



^•^The rejection part of the algorithm is important for reahzing equihbrium ensemble; there are 
however rejection free algorithms; in these, the information on the dynamics is transferred to the 
time axis; these are called event driven algorithms; the Metropolis algorithm is time driven. We shall 
discuss an event-driven algorithm called the n-fold way, later. See section (|15|l 
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Figure 4: Acceptance probability p. The solid line refers to Metropolis sampling: p = 
mm{l, exp(— x)}. The dashed curve refers to Glauber dynamics: p = 1/[1 -|-exp(+x)]. In 
the above x = (3AE. 



the chain; Then the configurations in the asymptotic part of the chain constitute a 
canonical ensemble, which can be employed for calculating the desired macroscopic 
properties of the system. 

Fig. (jH) depicts the probability p of accepting a trial configuration Ct produced 
from the current configuration C^, as a function of x = l3[E{Ct) — E{Ck)]- It should be 
remarked that we can devise several other transition matrices consistent with detailed 
balance. For example, in the Metropolis algorithm, the derivative of the acceptance 
probability p{x) is not continuous at x = 0, see the solid line in Fig. (@)). Hence 
Glauber [50^ proposed a dynamics where the probability p of accepting a trial state C^, 
constructed from the current state C^, is given by 

exp [-f3E{Ct)] 
^ exp[-/3EiCk)]+exp[-/3E{Ct)] 

1 

1 + exp [P{E{Ct)-EiCk)}] 



1 + exp(x) 

The above choice of acceptance probabihty is also consistent with the detailed balance. 

There is another dynamics in vogue called the heat-bath algorithm. We define 
hi = J2j Sj, where the index j runs over all the nearest neighbour spins of S^. We 
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define a probability 



The heat-bath algorithm jHI] consists of updating the spin Si{k) to Si{k + 1) where 
Si{k + 1) = ±1 with probabilities Pi and 1 — Pi respectively. This is implemented by 
drawing a random number ^ (uniformly distributed in the interval to 1) and setting 
Si{k + 1) = SIGN{pi — where the function SIGN{rf) equals +1 if ?7 > and —1 if 
< 0. Thus the spin at site i behaves as if it is in its private heat bath constructed 
by its nearest neigbbours! Let us now cast the Glauber algorithm in the language of 
the heat-bath algorithm employing pi. 

In the Glauber algorithm, if Si{k) = +1, then the probability Si{k + 1) = — 1 
is p = 1 — Pi] therefore, Si{k + 1) = —SIGN{1 — Pi — ^). On the other hand if 
Si{k) = —1, then the probability that Si{k + 1) = +1 equals p = Pi and hence 
Si{k + 1) = SIGN[pi — It is in this sense that the Glauber dynamics differs from 
the heat-bath algorithm. 

However, it readily seen that, in the Glauber algorithm. 



P{Si{k + l) = +1 



S,{k) = +l)=P{S,{k + l) = +l 



Si{k) = -1^ =pi 



p(^S,{k + 1) = -1 S,{k) = +l) = p(^S,ik + !) = -! S,{k) = -l) = 1 - , (65) 

and these probabilities are identical to those of the heat-bath algorithm; hence the 
Glauber algorithm differs from the heat-bath algorithm only in a minor and irrelevant 
technical detail. 

I think we understand why do the Metropolis transitions take the system to equi- 
librium eventually. A system has a natural tendency to lower its energy; hence if a 
trial microstate has lower energy it is accepted with unit probability. But there are 
fluctuations present all the time; fluctuations are a part and parcel of an equilibrium 
system. It is precisely because of these fluctuations that an equilibrium system man- 
ages to remain in equilibrium. The fluctuations also correspond to measurable physical 
properties of the macroscopic system, e.g. the specific heat corresponds to energy fluc- 
tuations. Hence when the trial microstate is of higher energy, we still accept it but 
with a probability less than unity; larger the energy increase, lower is the acceptance 
probability, since in equilibrium larger fluctuations are rarer. 

That the Metropolis algorithm assures asymptotic equilibrium can also be shown 
rigorously mathematically. The transition matrix W, whether based on the Metropolis 
dynamics or the Glauber/heat-bath dynamics, has the following properties: Wij > 
V z, j and J2j Wj^i = 1 W i. This second property is just a normalization condition: 
the system has to be in any one of the microstates including the present one, after a 
transition. Using the normalization in Eq. (jSUjl . we get, 

PiCi,n + l) = ^iyi,,P(C„n) V^. (66) 



^^Fluctuation dissipation theorems relate the equilibrium fluctuations to nonequilibrium or more 
precisely near equilibrium responses. 
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The above can be written in a convenient vector notation 

\P{n + l)) = W\P{n)) (67) 

where the components of the column vector \P{n)) are the probabilities of the mi- 
crostates of the closed system at time n and the dimension of the vector is equal to 
the size of the sample space. Specifying the sample space and the probabilities of the 
microstates of the sample space uniquely defines an ensemble. Hence we shall call 
\P{n)) an ensemble at time n. Thus the transition matrix acts on the ensemble \P{n)) 
and transforms it to the ensemble \P{n+ 1)). Let us start with an arbitrary ensemble 
\Po), such that (ttIPq) 0; in other words the initial ensemble chosen has a non-zero 
overlap with the equilibrium ensemble. Let W act on \Pq) successively some n times 
and transform it to \P{n)). 

WlPo) = \P{n)) (68) 

We want \P{n)) \tt) as n — >• oo and any subsequent transition should not change 
the ensemble. For this to happen the largest eigenvalue of W must be real, non- 
degenerate^^ and unity; all other eigenvalues must be smaller than unity in modulus; 
also the eigenvectors of W must form a complete set. We can then express the initial 
ensemble in terms of the eigenvectors of W. Upon repeated operations by W, only the 
eigenvector of the largest eigenvalue survives. Hence equilibrium ensemble is the right 
eigenvector of W corresponding to the (largest) eigenvalue unity: 

W\7r) = \n) . (69) 

Ivr) is called the equilibrium or the invariant distribution of the matrix W. The Peron 
- Frobenius theorem j^Tj^H] ensures that such an asymptotic convergence to an equi- 
librium ensemble is possible if the transition matrix W has the following properties: 

• W^j > V z,j 

• Wj,, = 1 V 2 

A transition matrix W is called ' balanced ', if it obeys the above two conditions. In 
additionn we demand the Markov chain to be ergodic: {W"^)ij > V z, j and m < oo. 
This ensures that every microstate (g Clcs) is reachable from every other microstate 
(g Clcs) in a finite number of transition steps. If the transition matrix obeys a more 
restricted detailed balance condition, then the above two requirements are automati- 
cally fulfilled and hence is ' balanced '; detailed balance condition is sufficient though 
not necessary for asymptotic convergence to equilibrium, see below. 

There are several Monte Carlo techniques in vogue, that do not satisfy detailed 
balance condition, see e.g. but nevertheless assures march to equilibrium. As we 
have seen above, the (weaker) balance condition suffices for asymptotic equilibrium, 
see for e.g. [53 . 

^^the equilibrium corresponds to a unique ensemble 

^^once the system reaches equilibrium it should remain in equilibrium 

^^In fact we can define a transition matrix W called the reversal of W [3^ or the 7r-dual of W |55j . 
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12 How do we calculate Monte Carlo averages and 
error bars? 

We calculate the required macroscopic property by averaging over the ensemble con- 
structed as per the Metropolis rejection technique described earlier. For example, the 
energy given by, 

E{C) = -JY.S,{C)S,{C) , (70) 

when the Ising spin system is in the configuration C. Similarly, the magnetization in 
the microstate C is given by 

M(C) = Y.S^{C). (71) 

i 

The above and any other macroscopic quantity of interest can be averaged over an 
ensemble generated by the Monte Carlo algorithm. 

Let Ci, C2, , ■ ■ - Cat be the N successive microstates in the asymptotic part of a 
Markov chain generated by the Metropolis algorithm. Let Xi = x{Ci) be the corre- 
sponding values of a macroscopic property of the system, x can be M, E, M^, 
etc. An estimate of the average of x is given by 

1 ^ 

i=l 

which in the limit of ^ 00 gives (x). We call xn as a finite sample estimate of (x). 
We can estimate from the ensemble, all the macroscopic properties of the Ising spin 
system like (E), and (M); Employing fiuctuation dissipation theorems we can estimate 
specific heat given by 

r _dE_{E^l-^ 

- - k^T^ ' ^^^^ 
and the magnetic susceptibility given by 

(74) 

The Monte Carlo estimate xat is always associated with statistical error since is finite. 
The standard deviation of xat is the one-sigma confidence interval or the statistical 



It is given by, 

W = diag(7r)WMiag(l/7r) 

where diag(7r) denotes a diagonal matrix whose diagonal elements are the components of the equi- 
librium vector \tt), and denotes the transpose of W. Both the transition matrices W and W 
have the same invariant distributions^ W describes the time reversed markov evolution. If W obeys 
the detailed balance condition then W = W. The matrix W has been found useful in several recent 
studies in nonequilibrium statistical mechanics like fluctuation theorems, work-free energy relations, 
etc., see for e.g. 
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error associated with xn- We express the statistical error employing the Central Limit 
Theorem: we present the Monte Carlo result as 



sigma confidence interval. The key idea is that in the limit — > oo the random variable 
xn becomes Gaussian distributed with mean {x) and variance a'^{x)/N. Notice that 
asymptotically (A^ oo) the statistical error goes to zero as N~^/'^. In other words 
the distribution of the sample average tends to a delta function in the limit N ^ oo. 

In calculating the statistical error, replace a^{x) by the sample fluctuations, denoted 
by 5*^ and given by, S]^ = J2iLi{xi — xnY /{N — 1) . Equation (ffHjl for the statistical 
error holds good only if the Ising spin configurations sampled are all uncorrelated. The 
configurations generated by the Metropolis algorithm at successive MCSS are usually 
correlated. Hence the actual statistical error would be higher than what is calculated 
employing Eq. ()75|1 . I shall have more to say on this later. 

Before we go further and take up more advanced topics there are a few details and 
few issues we should attend to. For example before taking average over the ensemble, 
we must ensure that the Markov process has equilibrated, i.e., P{C,t) -P(C), as 
t ^ oo V C G ^^cs- The Ising system should have forgotten the arbitrary initial spin 
configuration. How do we find this? First we must check if the average converges to a 
constant value: split the data into several bins and calculate the average for each bin. 
Discard the initial bins which give averages different from the latter. Another way is 
to calculate the autocorrelation function and from it, the correlation time r* in units 
of MCSS. Discard data from the initial, say 10 r* MCSS. 

Another problem is related to the issue of ergodicity. The system can get trapped 
in a region of the configuration phase and not come out of it at all. It is also possible 
that the dynamics is quasi-ergodic, in the sense that the system gets trapped in local 
minimum and is unable to come out of it in any finite time due to the presence of high 
energy barriers. A possible check to detect these is to carry out several simulations 
starting from different initial spin configurations and see if all of them give more or 
less the same results within statistical fluctuations. 

The next question concerns boundary conditions. Periodic boundary conditions are 
often employed: the lattice wraps around on itself to form a torus. Consider a two 
dimensional square lattice of sites. The first site in a row is considered as the right 
nearest neighbour of the last site in the same row and the last site in a row is considered 
as the left nearest neighbour of the first site in the same row. The same holds for the 
top and bottom sites in each column. Periodic boundaries are known to give least finite 
size effects; we shall see about finite size effects sometime later. There are also several 
other boundary conditions e.g. rigid, skew periodic, anti periodic, anti symmetric, free 
edge etc., that are implemented depending on the nature of the problem. 




(75) 



where a^{x) = (x^) — (x)"^. The quantity cr(x) / is called the statistical error, or one- 
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13 How does an Ising spin system behave in the 
vicinity of phase transition? 

To distinguish one phase from the other we need an order parameter; magnetization M 
is a good order parameter to study phase transition in ferromagnetic systems. Usually, 
the order parameter is zero in the high temperature disordered paramagnetic phase 
and non-zero in the low temperature ordered ferromagnetic phase. In the Ising model 
simulation, we shall be interested in the second order phase transition. 

The magnetic phase transitions are characterized by critical exponents. In the limit 
of T Tc, we have 



M{T) 
V 

X{T) 
V 

Cv{T) 
V 



in - Tf for T < , 



\T-T, 



-7 



c| ) 



|T-Tc|-". (76) 



In the above V is the total number of spins in the system. (3, 7 and a are called the 
critical exponents. For the two dimensional Ising model system /5 = l/8, 7 = 7/4 and 
a — Q. In fact, the specific heat goes like 



^ X In ( I T - Tc I -^). (77) 



The correlation length at temperature T, denoted by ^(T), which measures the typical 
linear size of a magnetic domain, goes like ^(T) ~ \T—Tc\^^, and the exponent 1/ — 1 for 
the two dimensional Ising model. The critical exponents are universal in the sense their 
values do not depend on the details of the model; they depend only on gross features 
like dimensionality of the problem, the symmetry of the Hamiltonian etc. Indeed this 
is precisely why very simple models like the Ising model are successful in describing 
phase transition in real systems. 

The correlation length ^(T) is expected to be of the order of zero when T » Tc. At 
very high temperature, the spins behave as if they arc independent of each other. Spin- 
spin interactions are irrelevant compared to the thermal fluctuations. Entropy wins 
over energy completely. The macroscopic properties of the system are determined by 
entropic considerations only. As T decreases, the spin-spin interaction becomes more 
and more relevant and the correlation length diverges as T —> Tc. If the system studied 
is finite, the correlation length, at best, can be of the order of the linear dimension of 
the system. This brings us to the important topic of finite size effects and finite size 
scaling. 
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14 What is finite size scaling and how do we imple- 
ment it? 

At the outset, we recognize that a finite system can not exhibit phase transition. Con- 
sider a square lattice of size say Lx L. If the correlation length ^(T) << L, then for all 
practical purposes, a finite system can be taken as a good approximation to an infinite 
system. In other words, our simulations on a finite lattice would give good results if 
the temperature T is not close to Tc, i.e. if ^(T) « L. But when T is close to Tc, 
the correlation length becomes of the order of L. A finite system thus can not show 
any divergence of the specific heat or susceptibility; we would get a sort of rounded 
peaks at best, instead of sharp divergence. Of course as L becomes larger and larger, 
the peak would become sharper and sharper. We present in Fig. (jS)), magnetization 
per spin, |M|/V" (where V = L^), as a function of J/ksT for two dimensional square 
lattices of size L = 2,4, 8, 16, 32 and 64. It is clear from the figure that the transition 
becomes sharper as the system size increases. Similar plots for magnetic susceptibility, 
are shown in Fig. (jHl). The peak is rounded and broad for smaller system sizes. As the 
system size increases we find the peak becomes sharper and its height increases. The 
behaviour of specific heat as a function of temperature for different system sizes are 
shown in Fig. ((Tj). We recognize that the correlation length for a finite Lx L system 
can at best be L, when T is close to T^. Hence we can write 

mrZn = L (78) 

From this we get 

\T-T^\ ~ L-^l" (79) 



We can now express the magnetization, magnetic susceptibility, and the specific 
heat as power laws in L. 
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(80) 

The above finite size scaling is implemented as follows. For a given L, find the value 
of T at which the specific heat peaks. Call that temperature as Tc{L). Calculate, for 
example \M{T)\/V at T = Tc{L) for different values of L. Plot In [\M{L)\/V] against 
ln(L). The slope of the best fit straight line would give —(3 /v. 



^^These plots have been obtained from the computer program ISING.FOR, written and tested by 
my coUeague V. Sridhar, MSD, IGCAR, Kalpakkam. The program can be obtained from him by an 
e-mail to vs@igcar.ernet.in . 
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Figure 6: Magnetic susceptibility per spin (x/L^) vs. J/k^T for various system sizes L 
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So far so good; we know now how to assemble, employing random numbers, a 
Markov chain of microstates each produced from its predecessor through an attempted 
single flip. What happens when the flipping probability is very very small? The 
system has a natural tendency to remain stay put in a microstate step after step after 
step. Nothing happens to the dynamics for a very long time. But the Monte Carlo 
clock is ticking all the time. Can such a wastage of computer time be avoided? In other 
words, can we simulate a very slow dynamics by a fast algorithm? The answer is yes; 
in the year 1975, Bortz, Kalos and Lebowitz [57j proposed an event-driven algorithm 
that precisely does this. They called their algorithm the n-fold way. 



15 What is the n-fold way? 

The n-fold way is an event - driven algorithm. We ensure that an event happens at 
every algorithmic time step. In the context of Ising model simulation, this means that 
at every MCS we have a new spin configuration. Let C^ denote the spin configuration 
at time k. As we have already explained, this configuration belongs to one of the 
microstates, say Ci of the sample space flcs underlying the closed system. Define 
as the probability for the system to persist in its current microstate Ci for one MCS. 
Afc is formally given by = Wi^i = 1 — ^j,h where W is the transition matrix 

""^^the flipping probability can become small for several reasons: (a) when the temperature is very 
low; (b) when the system is in or very close to equilibrium (c) when the system is in a metastable 
state (d) when the system is very close to a critical state etc. 



36 



K. P. N. Murthy 



underlying either the Metropohs or the Glauber dynamics. The probability that the 
system continues to be in the current microstate for the next m MCS and makes a 
transition to a different microstate in the (m + l)-th step is given by the geometric 
distribution, 

p{m) = Ar(l-A,) (81) 

The first part of the n-fold way consists of assigning a life time to the microstate Ck- 
It is sampled from its distribution, see Eq. (jHT|) by a simple analytical inversion: 

where ^ is a random number (uniformly distributed in the range to 1); we need the 
life time in terms of MCS; hence we calculate m(Cfc) = [T(Cfc)J where the RHS denotes 
the largest integer less than or equal to T(Cfc). Thus in one shot we have advanced the 
Markov Chain by m steps: 

{■ ■ ■ ) = Ci, Ck+l = Ci, = Cj, ■ ■ ■ , Ck+m = Cj, ■ ■ ■}• 

The second part of the n-fold way consists of determining C^+m+i- This can be obtained 
in principle from the the transition matrix W. The whole process is repeated and this 
constitutes the n-fold way of generating a Markov chain of microstates. 

I said that and C^ can be obtained, in principle, from the transition matrix 
W. But in practice, we do not need the transition matrix; also it is not desirable 
nor feasible to work with the transition matrix since its size is rather large; as we 
have already seen, even for a small system of Ising spins on a 10 x 10 square lattice, 
the associated transition matrix would contain some 10^° elements! The practical 
implementation of the n-fold way proceeds as follows. All the spins in the system can 
be put into n classes where n is a small manageable number; Let rii be the number 
of spins in the class i when the system is in a microstate C^ at time k. It is clear 
J27=i '^j = ^- What is the value of n and what are these n classes? Consider a two 
dimensional square lattice with periodic boundaries. Let us consider a spin which is 
oriented up (t). Call this the reference spin. The local environment of the reference 
spin is one of the following five types: (i) all the four nearest neighbours are up; (ii) 
three nearest neigbours are up and one is down; (iii) two nearest neighbours are up 
and two are down (iv) one nearest neighbour is up and the other three are down and 
(v) all the four nearest neighbour spins are down. There are thus five classes when 
the reference spin is up; When the reference spin is down, there are five more classes, 
making the total number of classes 10. Thus each of the V spins in the system belongs 
to one or the other of these n classes, hence the name n-fold way, where n is 10 in the 
present example. Let us define a local energy for each spin; it is given by the sum of 
its interaction energies with its four neighbours and with an external field B. We can 
flip the spin and calculate the change in the local energy. All the spins belonging to a 
class have the same local environment, same local energy and cause the same change in 
energy when flipped. Table (5) presents the ten classes and their common properties. 
Hence it is quite adequate, see below, if we consider only these ten classes for purpose 
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Table 5: The ten classes of the 10-fold way for a two dimensional square lattice with periodic 
boundaries 



Class 


reference 


number of 


Local energy 


AE 




spin 


nearest neigbour 
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+B 


-2B 
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1 


-2J + B 


+4 J - 2B 


10 


i 





-4J + B 


+8 J - 2B 



of obtaining and C^+^+i. Let AE{i) be the increase in energy when a spin of the 
i-th class is flipped, see the last column of Table (5). This quantity is the same for all 
the spins belonging to the same class. Let Pi be the probability of accepting a spin flip 
in the i-th class. It is given by, 



Pi 



min^l, exp 



1 + exp 



PAE{i] 



for Metropolis 



for Glauber 



(83) 



The value of pi depends on whether we we use the Metropolis dynamics or the Glauber 
dynamics. The probability of selecting a class is rii/V . Thus the probability of flipping a 
spin in the system is given by g = Zli£i '^iPi/V. This immediately gives us = 1 — g. 
Once we know we can sample a life time m from the geometric distribution as 
described ear her. We then select a class from the distribution — riiPi/ (qV) : i — 1,10; 
then a spin is chosen randomly from the selected class and flipped to give Cfc+^+i. 
Calculate the set {nj : j = 1, 10} for the new configuration. This kind of book- 
keeping is the time consuming part of the n-fold way; however the changes are confined 
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to the class indices of the flipped spin and its four nearest neighbours. These changes 
can be carried out employing simple rules: If the flip is from up (t) to down (|), the 
class of the flipped spin increases by 5 while that of each of its four nearest neighbours 
increases by 1. If the flip is from down (j,) to up (j), then the class of the flipped spin 
decreases by 5 and that each of the four nearest neighbours decreases by 1. Having 
updated the array {uj} and indexed each spin by its class, we iterate the entire process 
and get a Markov chain of microstates. The macroscopic properties are calculated from 
the chain in the usual way described earlier. 

Alternately we can store the life times {T(Cj)} of each of the microstates belonging 
to the Markov chain {Co, Ci, C2, • • •} where a microstate Cj is obtained from Cj_i 
through a spin flip in the n-fold way. While averaging we weight each microstate with 
its corresponding life time: 



Instead of sampling the life time from its geometric distribution one can use the average 
life time of a microstate Cj, given by. 



and employ it as weight while averaging. This completes the discussion on the n-fold 
way. 

The n-fold way is a particular case of a more general technique called the Monte 
Carlo with Absorbing Markov Chain (MCAMC) [5H]. I shall not discuss these tech- 
niques here and instead refer you to the beautiful recent review article of Novotny jSHj . 

The n-fold way is a fast algorithm that realizes a slow dynamics. As I said earlier, 
the dynamics of an equilibrium or near equilibrium system is slow; at very low tem- 
peratures the dynamics is slow; many a times the system gets stuck in a metastable 
state which drastically slows down the dynamics. There exists another phenomenon 
for which the Metropolis/Glauber dynamics is very very slow: when the system is 
close to a critical state. This is called critical slowing down - an important issue in 
the statistical mechanics of continuous phase transition. To this we turn our attention 
below. 

16 What do we mean by critical slowing down? 

A problem with Monte Carlo simulation is that statistical error decreases with increase 
of sample size only as N'^/^. This means that to get the numbers to just one extra 
decimal accuracy, we need to increase the sample size by a factor of hundred. Things 
are much worse. The error reduction as N~^^'^ happens only if the members of the 
ensemble are independent. But successive Monte Carlo configurations generated by 
Metropolis rejection technique are usually correlated jHO]. Such correlations increase 
the statistical error. Let us see how this comes about. First notice that given by 
Eq. (f?^ is a random variable. The mean of xn is (x). The variance of x^ is formally 
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(84) 
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given by, 



N 




N 



Ik 



16) 



where the correlation 7^ is given by 



7fc 



(87) 



In writing the above, we have assumed that the sequence Xi, X2, ■ ■ ■ , xat is stationary. 
The correlations are estimated from the Monte Carlo Markov chain as, 



7fc 



N Ei=i {xj - xn) {xj+k - xn) 
N-k Y.tl{x^-xr,f 

Let us now define the so called integrated correlation time as. 




^9) 



We can now express the Monte Carlo result as 



a[x) , 

Xn ± ^F= X Vl + 2r* 



N 



(90) 



It is clear that if we have an uncorrelated data set, r* = and the expression for the 
statistical error given by Eq. (plj) reduces to that given by Eq. (fTSjl . The value of r* is a 
measure of the number of MCSS we need to skip for getting a microstate uncorrelated 
to the present microstate, after the system has equilibrated. The important point is 
that the correlation time r* diverges as T — > Tc; this means that finite sample Monte 
Carlo estimates of the macroscopic properties of the system become unreliable when 
the temperature of the system is very close to the critical value. The divergence of r* 
as T — > T(7 is called critical slowing down. 

Thus we find that the statistical error depends on the correlation time. Is there a 
way to estimate the statistical error that does not require explicit calculation of the 
correlation function 7^? The answer is yes and we shall briefly discuss the so called 
blocking technique, which is essentially a real space renormalization group technique 
applied to the one dimensional discrete space of MCSS. 



17 What is a blocking technique ? 

For a lucid account of the blocking technique, see the paper by Flyvbjerg and Pe- 
tersen jHT]. I shall confine myself to discussing the implementation of the technique. 
We are given a set of correlated data. 



^0 — {x\ \ x2'^ ■ ■ ■ x\^^. 



(91) 
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How do we calculate the statistical error associated with the sample mean, 



1 ^ 

■'■ (0) 9 



(92) 



Let f2o denote the number of data points in the set fio- Note i7| 
sample variance given by, 



'0 — 



A^. Calculate the 
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(93) 



Let eg = S'^(f2o)/(^o ~ !)• Then transform the data to another set of data f2i = 
{xi'\x'2\---x^y^} which is half as large, i.e. Cli = (lo/2 = N/2. In general the 
equations that relate the data set Qk to i^fc-i are given by. 




= k = l, 2, ■■■ . (94) 



Calculate el from the data set flk- We get a string of numbers e^, ■ ■ ■ . These 
numbers would increase with iterations and eventually reach a constant value within 
statistical fluctuations beyond certain number of iterations. In other words efj^j^ = ef 
for all > 1 and for some /. In any case the iteration has to stop when the number of 
data points in the set becomes 2. The desired statistical error (one sigma confidence) 
of the calculated mean m is given by e;. 

An immediate issue of concern is the implications of critical slowing down to Monte 
Carlo simulation. The Metropolis dynamics slows down for T close to Tq. The closer T 
is to Tc, the slower does the dynamics become. As we have already seen, this builds up 
temporal correlations and Monte Carlo estimates of the macroscopic properties become 
unreliable. Let the divergence of the integrated correlation time r* with T be expressed 
as 



where z = A/u. The dynamical critical exponent z is greater than unity; in fact, for 
the local update algorithms like the Metropolis, the value of z is nearly 2 and above. 
This tells us that for T close to Tc, the Monte Carlo error becomes large especially 
when the system size L is large. This is indeed a major drawback of the Metropolis 
Monte Carlo technique. Fortunately this problem of critical slowing down can be very 
efficiently overcome by employing the so called cluster algorithms, wherein, a large 
cluster of spins is updated in a single step. Swendsen and Wang [^2] derived a cluster 
algorithm by mapping the spin problem to a bond percolation problem, based on the 
work of Kasteleyn and Fortuin [63j and developed by Coniglio and Klein |64j . 




(95) 



(96) 
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18 What is percolation ? 

Percolation is a familiar term we use in the context of brewing of coffee: Hot water 
injected at one end percolates through packed coffee powder and is collected at the 
other end. In fact percolation was introduced in the year 1957 by Broadbent and 
Hammersley jHS] as a model for the transport of liquid through a porous medium, 
spread of a disease in a community and other related phenomenon. For an excellent 
early discussion on percolation see the little book by Hammersley and Handscomb 
listed under [39j. Let us abstract the notion and consider N x N square lattice of 
sites. Let < p < 1 denote the probability of placing a bond between a pair of 
nearest neighbour sites. Carry out the following experiment. Select a pair of nearest 
neighbour sites; select a random number ^; if ^ < p place a bond between the selected 
pair. Otherwise do not put a bond. Carry out this operation independently on all the 
nearest neighbour pairs of sites in the lattice. When two nearest neighbour sites are 
connected by a bond they form a cluster of two sites. When another site gets bonded 
to one of these two sites then the cluster is of three sites, and so on. At the end, you 
will have clusters of different sizes. If p is small, typically, we expect the lattice to 
contain only small clusters. If p is close to unity, there will exist a cluster that extend 
from one end of the lattice to the other. Such a cluster is said to span the lattice. It 
is called a spanning cluster. Let Pl{p) denote the probability that a spanning cluster 
exists on an L X L square lattice. For a given L it is clear that 

{0 for p ^ , 
(97) 
1 for p ^ I . 

In fact, we find that the percolation probability Pl{p) changes sharply from zero to 
unity at a critical value of p = pc- The transition becomes sharper as ^ oo. In 
other words, 

{0 for P < Pc , 
(98) 
1 for p > Pc ■ 

Thus, there is a geometric phase transition ai p = pc- The value of pc is 0.5 for a 
two dimensional square lattice in the limit N ^ oo. Figure (jHl) depicts Pl{p) versus 
p for various values of L. We can clearly see that the transition becomes sharper 
with increasing system size. Even for L = 50 the transition is already quite sharp. 
The statistical error in any of the data points plotted in Fig. (jH)) does not exceed a 
maximum of ±5%. 

What we have discussed above is called bond percolation, which is of relevance to 
us in the context of discussions on Swendsen-Wang algorithm. There is also site per- 
colation. For more on percolation read the delightful little book by Stauffer [02] • The 
most interesting feature of the phenomenon of percolation in the context of our present 
discussion is that unlike thermal transition, the geometric percolation transition is free 
from critical slowing down. Every sweep over the lattice generates an independent 
configuration and hence the correlation time is exactly zero. This is precisely what 
Swendsen-Wang algorithm makes use of. 
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Figure 8: Pl{p) versus p for bond percolation. The statistical error is less than 5% for all 
the points. 
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19 What is Swendsen-Wang algorithm? How do we 
implement it? 

In the Swendsen-Wang algorithm we start with a spin configuration say C. Select two 
neighbouring sites and call them If Si = Sj, then the bond connecting the sites 

i and j becomes eligible for occupation. Such a bond is often referred to as satisfied 
bond. If Si 7^ Sj, then the bond is not a satisfied bond and hence is never occupied. A 
satisfied bond is occupied with a probability p, to be specified later. This is carried out 
as follows. Select a random number (uniformly distributed between and 1). If it is 
less than p occupy the (satisfied) bond. Otherwise do not occupy the bond. Carry out 
this process for each pair of nearest neighbour sites in the lattice. At the end you will 
have several clusters of sites connected by occupied bonds. Let us call these Kastelyn - 
Fortuin - Coniglio - Klein (KF-CK) clusters. Figure depicts an example of KF-CK 
clusters on a 10 x 10 square lattice. Filled circles denote the up spins and open circles 
denote the down spins. Occupied bond is denoted by line connecting the neighbouring 
'like' spins. The clusters are shaded and the spanning cluster is shaded dark. We have 
used a value of J/ksT = 0.6. The critical value, J/k-QTc, is 0.4487. I must emphasize 
that these bonds are fictitious; there is no energy associated with them. They only 




Figure 9: Kastelyn - Fortuin - Coniglio - Klein (KF-CK) clusters on a 10 x 10 square 
lattice, for the Ising spin system. Filled circles denote up spins; open circles denote down 
spins. Satisfied and occupied bond is denoted by line connecting nearest neighbour 'like' 
spins. The spin-spin interaction strength J/k-^T is taken as 0.6. The critical value J/k-^Tc 
is 0.4407 for a square lattice. The clusters are shaded. Spanning cluster is shaded dark. 
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serve to define a cluster. Having thus formed the KF-CK clusters, assign to each 
cluster a spin variable +1 or —1 independently, randomly and with equal probability. 
All the spins in a cluster acquire the spin value assigned to the cluster. Remove the 
fictitious bonds. What you get is a new configuration of spins. Start all over again 
the process of constructing fresh KF-CK clusters. In other words iterate the whole 
process and generate a chain of spin configurations; the process obeys detailed balance 
and is ergodic; hence the configurations in the asymptotic part of the Markov Chain 
constitute a canonical ensemble of microstates at the desired temperature. Note that 
temperature enters into picture through the bonding probability p — p{T). 



19.1 What is the appropriate value of the bonding probability 
p for a given T ? 

The interaction energy associated with a pair of nearest neighbour spins is eij = 
—JSiSj. The energy is —J when the two spins are aligned and +J, when not. For 
convenience, let us set the minimum of the energy scale at zero by defining Sij — 
—J{SiSj — 1). The Hamiltonian for the spin system can now be written as 



H{C)^-JY: S,{C)Sj{C)-1 



(99) 



and the total energy ranges from to +2JNe where, Ne is the total number of pairs 
of nearest neighbour spins in the lattice. In the above expression for the Hamiltonian, 
let us carry out the sum over pairs of nearest neighbour 'like' spins and of nearest 
neighbour 'unlike' spins separately and get. 



H{C) = 2J 



Ne - B{C) 



(100) 



where B is the number of satisfied bonds (number of pairs of nearest neighbour 'like' 
spins in the spin configuration (microstate) C). The canonical partition function is 
then given by. 



= I^exp 
c 

c 



-2(3J{Ne-B) 



(101) 



where q = exp(— 2/?J). Let us define p = 1 — g. We can select randomly b bonds from 
amongst B satisfied bonds and occupy them. Number of ways of doing this is given by 
fl{b, B) = B\/[b\{B — b)\]. If we interpret p as the probability of occupying a satisfied 
bond, then the probability of having b occupied bonds in a given a spin configuration 
C is P{b,B{C)) = Q{b, B{C))p''q^^^^~\ Notice that Ef=o ^(^ ^) = (P + ?)^ = 1- Then 
the canonical partition function can be written in a suggestive form, given by, 

Z{I3) = ^g^--^(^)(p + 5)^(^) 
c 

= Y.^^^-^j:^Q>^B)p\^-' 

C 6=0 
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= ^ 5: s) (102) 

C 6=0 

From the above it is clear that the correct value of the bonding probability that must 
be used in the Swendsen-Wang cluster algorithm is given by 

j9= l-g = l-exp(-2/3J). (103) 

We can construct the same partition sum in a different way. Start with the lattice 
sans Ising spins. Put randomly b bonds, each connecting a pair nearest neighbour 
sites. Number of ways of doing this is Q{b,NE) = NE\/[b\{NE — b)\]. The probability 
of constructing a single bond is taken as p = 1 — exp(— 2/3J). We call the lattice with 
a specific arrangement of b bonds as a graph G. Let Nc{G) denote the number of 
clusters in the graph G. Note that all the graphs having the same number of bonds 
need not have the same number of clusters. Consider now the graph G having vertices 
equal to the number of lattice sites or Ising spins, Ne edges (equal to the number of 
nearest neighbour pairs of spins) and b bonds. We can now decorate the vertices of G 
with Ising spins. While doing so we would like to be consistent so that the bonds we 
have placed in the lattice are occupied bonds. How many Ising spin configurations are 
compatible with the graph G? It is clear that all the vertices in a given cluster must 
hold 'like ' spins ( all 'up' t or all 'down' |). Hence there are 2^'='^*^^ spin configurations 
compatible with G. The statistical weight of the graph G is pK^) qNE-h{G)2Nc{G) ^ 
can calculate such weights for all the graphs for a given b and then for all b ranging 
from to Ne- The canonical partition function, see Eq. ()102j) . can now be written 
equivalently as a sum of the weights of all possible graphs and is given by, 

Z{(3) = Y^pHG) qN^-KG) 2^ciG) (104) 

G 

Thus the interacting Ising spin problem has been mapped on to a noninteracting 
correlated random bond percolation problem, see [UniESlinilinniEZl • The spontaneous 
magnetization in the Ising model corresponds to the percolation probability in the 
percolation problem; the magnetic susceptibility is analogous to the average number 
of spins per KF-CK cluster; energy and specific heat are like the number of occupied 
bonds and its fluctuations; the correlation length in Ising system corresponds to the 
linear size of cluster in the correlated random bond percolation model. Purely from a 
practical point of view the Swendsen-Wang technique based on KF-CK clusters has 
emerged as a powerful tool in Monte Carlo simulation of large systems. For a 512 x 512 
lattice system the value of r* can be made as small as ten which is some ten thousand 
times smaller than that for the single flip Metropolis algorithm. This completes the 
discussion on Swendsen-Wang algorithm. 

For implementing the cluster update Monte Carlo we need a fast algorithm to 
identify the clusters. An ingenious technique of cluster counting was proposed by 
Hoshen and Kopelman 68 . The technique is straight forward, easy to implement and 
quite fast. 
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20 What is the basis of the Hoshen-Kopelman clus- 
ter counting algorithm? 

The general principle of the Hoshen-Kopelman algorithm is as follows. Consider an 
arbitrary dimensional lattice with arbitrary coordination number. The lattice sites are 
occupied by the relevant species - like atoms or spins. Lattice sites linked by suitably 
defined neighbourhood conditions constitute a cluster. The linkage can be due to 
presence of same type of atoms on neighbouring sites (site percolation ) or the presence 
of a bond between two neighbouring sites (bond percolation ) or any other definition 
dictated by the nature of the problem under investigation. In a lattice there can be 
many clusters. Ideally one can assign distinct labels to the clusters and associate with 
these labels the corresponding cluster sizes. A straight forward procedure would be to 
scan the lattice row by row and then layer by layer. The first time we hit a 'relevant' 
site we assign a label, say 1 to the site and associate with the label a number LL{1) = 1. 
If the next site in the row is linked to this site through the neighbourhood relation, 
then it is assigned with the same label and the array LL is updated, LL{1) = LL(1) + 1 
signifying that the cluster in its process of 'growth' has acquired one more site. On 
the other hand if the next site is not linked to the site previously examined, then it is 
assigned with a new label, say 2 and LL{2) = 1. The process of labelling the sites and 
updating the array LL is continued. A conflict starts when we go down the rows or the 
layer; we come across a site, say X, whose neigbouring sites to which this site is linked 
carry different labels, say /i, I2, I3, ■ ■ ■, /„. Physically this means that the present site 
under examination is the link through which the clusters seemingly carrying different 
labels, coalesce. In other words, the different clusters coalesce at this site to form a big 
cluster. Therefore we would like to assign a single label to all the sites belonging the 
coalesced big cluster. This would mean that we should go over all the previous sites 
examined and relabel them; the associated LL also should be changed accordingly. 
In other words, all the sites carrying the labels (/i, I2, ■ ■ ■ , In) should be assigned a 
common label, say I = Ik = min(/i, I2, ■ ■ ■ , In) where 1 < k < n. Also we must 
set LL{1) = LL{li) + LL{l2) + ■ ■ ■ + LL{ln) + 1. This kind of backward relabelling is 
the most time consuming part of the algorithm. Hoshen and Kopelman avoided the 
backward relabelling completely. The current site X under examination is assigned the 
label / = Ik] However we set LL{li) = —I, V i = l,n and i ^ k. Therefore LL{j) 
denotes the size of the cluster only if it is positive. If it is negative then —LL{j) denotes 
the label of the cluster it would start belonging to from now. A label k is proper if 
LL{k) > 0; if LL{k) < then the label k is called improper. When we come across 
a site whose neighbours carry improper labels then by examining the corresponding 
entries in the array LL, we can find the proper label. All these issues would become 
clear when we consider a concrete example, in bond percolation . 

Consider a two dimensional square lattice. The aim is to assign a cluster label to 
each site so that at the end of scanning we have sites having the same label belonging 
to a cluster; we shall have different clusters having distinct cluster labels. To this end 
we decide to scan the lattice sites from left to right in a row and then go to the next 
row below. We start with the first row, see Fig. [TUl We assign cluster label '1' to 
the first site and set LL{1) = 1. The next site is bonded to the previous site; hence 
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Figure 10: Hoshen-Kopelman cluster counting algorithm: We have to assign a cluster label 
to the site marked X. Before labelling the site X, we have LL{1) = 5, LL{2) = 1, LL{3) = 3, 
LL(4) = 1. The conflicting labels are h = 1 and I2 = 3. We select the lower of the two labels, 
namely h = I as label for the site X. We update LL{1) = LL(1) + LL(3) + 1 = 5 + 3 + 1 = 9. 
We set LL{3) = —li = —1. Thus the label '3' has become improper. The other entries in 
the array LL remain the same. The two clusters 1 and 3 coalesce. 



we assign the same cluster label '1' to this site and update LL{1) = LL{1) + 1. The 
next site (the third site in the first row) is not bonded to the previous site; hence we 
assign the next new label '2' to this site and set LL{2) = 1. We proceed in the same 
fashion assigning labels and updating the LL array. We complete the scanning of the 
first row. We find, see Fig. ITIH the first row assigned with labels 1—4; Also LL{1) = 2, 
LL{2) = 1, LL{3) = 3, and LL{4) = 1. Then we start scanning the second row. We 
consider the first site in the second row and check if it is bonded to the site at its top. 
If yes, then it assigned with the cluster label of the top site. If not, it is assigned with 
the next new cluster label. In Fig. (|TO|l the first site in the second row is assigned 
with the cluster label '1' . While scanning the second row, we need to check if the 
site under consideration is bonded to the site at its left (the previous site examined) 
and/or to the site at its top. If only one of these two sites is bonded to the current site 
then cluster labeUing and updating of the array LL is carried out exactly the same way 
as described earlier. You may come across a situation when the current site is bonded 
to both the sites (site at left and at top) with labels, say li and ^2- If h ^ h, there 
is a label conflict. What is the cluster label we should assign to the current site, li or 
I2 ? Fig. fllO|) illustrates an example where the current site is marked X. Physically this 
conflict implies that what we have considered until now as two different clusters are in 
actuality one and the same. The current site is the link. Such a cluster label conflict 
can be resolved by a second pass or by a backward relabelling scheme. But then this 
would be time consuming. The ingenuity of the Hoshen-Kopelman algorithm lies in the 
fact it avoids completely the second pass or backward relabelling. To understand the 
algorithm see Fig. (fTUj) . The current site marked X is linked to the site at its left (with 
label /i = 1) and also to the site at its top (with label I2 = 3). Thus h{= 1) and l2{= 3) 
are the conflicting labels. Let L = min{li, I2). In the example of Fig. (jlU|) . L = li = 1. 
We assign the label L = /i = 1 to the current site, update LL{L) = LL{L) + LL{l2) + 1 
and set LL{l2) = —L. In the example considered, the current site is assigned label '1' 
; LL{1) = 5 + 3 + 1 = 9 and LL{3) = —1. The label 3 has now become improper since 
LL{3) is negative. Before labelling the site X, the clusters having labels '1' and '3' 
were distinct. While examining the site X we discover that the two clusters are linked 
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Figure 11: Hoshen-Kopelman cluster counting algorithm: We have to assign a cluster label 
to the site marked X. Before labelling the site X, we have LL{1) = 9, LL{2) = 1, LL(3) = —1, 
-LL(4) = 1. The site X is bonded to the site above with label '3' . Hence we should assign 
label '3' to the site X. But label '3' is improper since LL{3) = — 1. We then check the label 
'1' . It is proper since LL{1) = 9 > 0. Hence the root label of 3 is 1. i.e. l§ = 1. We assign 
label '1' to the site X and update LL(1) = LL{1) + 1 = 9 + 1 = 10. 



through the site X and hence are not distinct. The two clusters, so to say, coalesce. 
We give the lower cluster label (L = 1) to the coalesced cluster, store in the array 
LL{L) the size of the coalesced cluster and declare the cluster label 3 as improper by 
setting LL{3) = —1. LL{3) takes the role of a pointer. After labelling the current site 
and updating the LL array we continue with the scanning. There may arise a situation 
when the current site is bonded to a site with an improper label, say h. In other words 
we find LL{li) = —I2 < 0. In such a situation we examine if I2 is a proper label. If 
yes, then the root label corresponding to h is I2. Let us denote the root label of h by 
the symbol ij^. Thus ij^ = h- If on the other hand we find that I2 is again an improper 
label with LL{l2) = —h < 0, the search for the root label continues until we get a 
proper root label //j^. This situation is depicted in Fig. (fTTj). The site marked X is 
bonded to the site above with label li = 3. But the label /i = 3 is an improper label 
since LL{li = 3) = —I2 = — 1 < 0. The label /2 = 1 is a proper label and hence 1^ = 1. 
Hence we assign the label 1 to the current site, update LL{1) = LL{1) + 1 = 10. If 
there arises a cluster label conflict, it is resolved exactly the same way as described 
earlier except that the resolution is done after searching for the corresponding proper 
root labels. A situation of this type is depicted in Fig. (fT^ . We have to assign a cluster 
label to the site marked X. Before labelling the site X, we have LL{1) = 10, LL{2) = 1, 
LL{3) = -1, LL(4) = 3, LL(5) = -4, LL(6) = 5. The conflicting labels are h = Q and 
I2 = 5. The label /i = 6 is a proper label. The label I2 = 5 is however an improper label, 
since LL{5) = —4. Hence we examine the label 4. It is proper since LL{A) = 3 > 0. 
Therefore Ij^ = = 6 and Ij^ = l§ = 4. We assign the smaller label '4' to the site, X 
update LL(4) = LL(4) + LL(6) + 1 = 3 + 5 + 1 = 9, and set LL(6) = -5. The label '6' 
has now become improper. In this fashion we complete scanning the entire cluster. In 
the end we shall have an array {LL{i) : z = 1, 2, ■ ■ ■}. The positive entries in this array 
denote the sizes of the different clusters present in the system. The Hoshen-Kopelman 
algorithm is quite general and can be easily employed to carry out cluster counting in 
a general d- dimensional lattice with arbitrary coordination number. The boundary 
conditions can also be easily taken care of in cluster counting. In the examples discussed 
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above we have not considered any boundary conditions. Babalievski [d9] has extended 
the Hoshen-Kopelman algorithm to random and aperiodic lattices. Hoshen, Berry 
and Minser 170"^ have proposed an enhanced Hoshen-Kopelman algorithm that enables 
efficient calculation of cluster spatial moments, radius of gyration of each cluster, cluster 
perimeter etc. Indeed more recently Ahmed Al-Futaisi and Patzek have extended 
the Hoshen-Kopelman algorithm to non-lattice environments [7T]. We can say that 
the Swendsen-Wang cluster algorithm coupled with the Hoshen-Kopelman counting 
constitutes a major step forward in Monte Carlo simulation. 
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Figure 12: Hoshen-Kopelman cluster counting algorithm: We have to assign a cluster label to 
the site marked X. Before labelling the site X, we have LL{1) = 10, LL{2) = 1, LL{3) = — 1, 
LL(4) = 3, LL(5) = -4, LL(6) = 5. The conflicting labels are h = 6 and I2 = 5. The 
label /i = 6 is a proper label since LL(6) = 5. The label I2 = 5 is an improper label, 
since LL{5) = —4. Hence we examine the label 4. The label '4' is a proper label since 
LL(4) = 3 > 0. Therefore the root labels are Iq = 6 and = 4. We assign label '4' (the 
smaller of the two root labels) to the site X, update LL(4) = LL(4)+LL(6) + 1 = 3+5+1 = 9, 
and set LL{6) = —5. The label '6' has become improper. 



21 What are the improvements to the Swendsen- 
Wang cluster algorithm? 

21.1 Wolff algorithm 

In the Swendsen-Wang algorithm, all the clusters, big or small are grown and updated. 
Small clusters do not influence the critical slowing down and hence computing efforts 
toward growing them are a waste. This is somewhat eliminated in the algorithm 
proposed by U. Wolff [72]. A site, say '2' is chosen randomly and a single cluster is 
grown from it as follows. If a neighbouring spin has the same orientation then the 
bond to it is 'satisfied' and hence is occupied with a probability p given by Eq. ()1()H|1 : 
Select a random number; if it is less than p occupy the bond; if not do not occupy. 
This process is repeated on all the neighbours of the site i and further on all the 
neighbours of the new sites that get bonded. The set of bonded sites constitutes a 
cluster and the cluster grows. The growth of the cluster would eventually terminate. 
Then all the spins in the cluster are flipped; the bonds are removed; we have a new spin 
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configuration. The whole process is iterated. The measurement of Monte Carlo time 
becomes problematic in the Wolff algorithm since the number of spin-flips is different 
in different steps. Usually the time advanced in a cluster flip is taken as the ratio of 
the number of spins flipped to the total number of spins in the system. 

The Wolff algorithm reduces significantly the correlation times and the dynamical 
critical exponent z. Hence for large lattices the Wolff algorithm would prove much 
better than the local update Metropolis algorithm. 

An important point about the Swendsen- Wang/Wolff algorithm is that a KF-CK 
cluster percolates at the Ising critical point. Indeed one can show that the spin- 
spin pair correlation function (SiSj) is equal to {jij) - the probability that the sites 
i and j belong to the same cluster. Here 7,^ is an indicator function: it is unity if 
the sites belong to the same cluster and zero otherwise. The average in the above is 
taken over all spin and bond configurations. This property suggests that there is a 
natural tendency for the system to self organize itself to criticality; this tendency for 
selforganized criticality can be captured in an algorithm. 

21.2 Invaded cluster algorithm 

Machta et al [731 proposed the invaded cluster algorithm to calculate the equilibrium 
critical points. This technique is based on the idea of invasion percolation [T^ . In 
invasion percolation we assign random numbers independently to the bonds of the 
lattice. Growth starts from one or several sites. At each step a cluster grows by the 
addition of the perimeter bond which has the smallest random number. The growth 
process stops when a cluster percolates. It is clear that the invasion percolation is 
a self organized critical phenomenon. In the algorithm of Machta et al |7H] we grow 
clusters from every site of the lattice and along the 'satisfied' bonds only. Bonds are 
occupied in a random order until an appropriate stopping condition is fulfilled. The 
growth process is stopped and each cluster is flipped randomly and independently with 
equal probability. This gives rise to a new spin configuration; in the new configuration 
re order the satisfied bonds randomly and start growing invaded clusters. Iterate. 
For example the stopping condition may be a requirement on the size of the largest 
cluster. The stopping rule can be devised suitably so that the Invaded Cluster algorithm 
simulates the system at critical point of the model. Thus the occupied bonds are 
determined by the stopping rule. Hence the invaded cluster algorithm has a natural 
feed back mechanism which ensures that the desired equilibrium critical regions are 
always reached. The phase transition temperature is an output. For more details 
see jTSI- 

21.3 Probability changing cluster algorithm 

Tomita and Okabe [7^] have proposed an algorithm wherein the system evolves to 
critical regions but asymptotically remains still a canonical ensemble. They call this a 
probability changing cluster (PCC) algorithm. The bonding probability is adjusted at 



^°Note that in the Swendsen- Wang algorithm it is the temperature that determines the occupation 
of satisfied bonds. See sections (|f 9|l and Ijiy.f |l . 
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the end of every step or a group of steps so that the spin system is driven to criticality. 
We start with an Ising spin configuration, and a certain value of the bonding probability 
p. As in the Swendsen-Wang algorithm, we grow the KF-CK clusters; then check if 
there is present a spanning cluster {k = 1) or not [k = —1). Carry out the cluster- 
spins-flipping exactly like you did in the Swendsen-Wang algorithm. For the next step 
change p to p — k6p. Iterate. It is clear that in the limit 6p 0, the PCC ensemble is 
identical to the canonical ensemble. Hence the algorithm not only drives the system to 
criticality and hence the critical points can be estimated but also the physical quantities 
like magnetization, susceptibility, specific heat, etc can be evaluated. For more details 
see [7S] 

21.4 More on clusters and some discussions 

I think it is rather clear now why the Swendsen-Wang algorithm in particular and 
several other cluster algorithms (like the Wolff, invaded cluster and the probability 
changing cluster algorithms, discussed above) in general are very effective in speeding 
up the computations; this stems from a very simple observation that a KF-CK cluster 
contains correlated spins; if a spin is flipped then all the other spins in the same KF-CK 
cluster have a tendency to flip. It is precisely this tendency for coherent flipping that 
is exploited when we flip all the spins in a cluster in one shot thus speeding up the 
dynamics. By the same token, in a physical phenomenon if a KF-CK cluster ceases to 
represent correlated spins, then the Swendsen-Wang and the related algorithms would 
fail. An example is the temperature driven first order phase transition studied by Gore 
and Jerrum ^j; they show that the Swendsen-Wang cluster algorithm is not efficient. 
The so called super critical slowing down of dynamics in a first order phase transition 
has its origin in the nature of the canonical distribution: it peaks at two phases and the 
system switches from one to the other through a rare 'interface event' . The correlation 
time T* is proportional to the residence time in the phases. The 'interface' regions 
are poorly sampled in the local update Monte Carlo algorithm. To handle such super 
critical slowing down, we need to go beyond Boltzmann sampling, like multicanonical 
Monte Carlo. I shall say something on these issues later. 

For another example consider the following disordered Ising Hamiltonian 

H = -Jj2{e,,S,Sj-l) , (105) 

where ejj(= ±1) is random quenched disorder. If there exists a closed path such that 
that the product of the disorder parameter e over the closed path is -1, then the 
Hamiltonian is said to contain frustration. A naive application of Swendsen-Wang 
cluster algorithm to a fully frustrated Ising system will not lead to any improvement 
whatsoever [77] ; it is because, in a frustrated Ising system, a KF-CK cluster does not 
contain correlated spins. One needs to modify the cluster definition to handle these 
situations. Several ideas have been proposed to this end and we refer to some of them 
in HE]. 

We need to carry out a series of Monte Carlo simulations at different tempera- 
tures to obtain the temperature dependence of the desired thermodynamic variable. 
This can be time consuming. It would be desirable to have a technique that would 
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give the temperature dependence of a macroscopic variable from a single Monte Carlo 
run. Also, in the standard Metropolis Monte Carlo technique, it is rather difficult to 
calculate entropy (or free energy), since the entropy is a function of the probability 
with which a microstate occurs in a canonical ensemble. The energy distribution in 
the canonical ensemble, P{E) oc D{E) exp{—PE), is the product of density of states 
D{E) which usually increases with increase of energy rather rapidly and an expo- 
nential which decreases rather sharply with increase of energy at low temperatures. 
As a result, microstates with very low or high energies get sampled rather poorly in 
Metropolis algorithm. If the energy minima are separated by high barriers, the sim- 
ulation becomes difficult since the system becomes quasi ergodic. Several techniques 
have been proposed addressing these and related issues. I shall discuss some of them 
in the remaining part of the talk. First let us consider the so called single histogram 
technique often used in the context of simulation of critical systems. 



22 What is an histogram technique ? How do we 
implement it? 

Consider a canonical ensemble of spin configurations generated at inverse tempera- 
ture 13*. The idea behind histogram technique is to calculate from this ensemble the 
macroscopic properties of the system at /? 7^ Let me illustrate the technique by 
considering energy histogram. 

Accordingly, let hi{f3*) denote the histogram of the energy obtained from a canonical 
ensemble of spin configurations generated at (3*. In other words, hi[j3*) is the number 
of spin configurations with energy in the interval AE around Ei. Let P{Ei, (5*) denote 
the probability density that the system takes an energy Ei. It is clear that P{Ei,i3*) 
is proportional to hi{(3*). Formally we have 

P{E^, n = ^^D{E,) exp {-(3*E;) , (106) 

where D{Ei) is density of states at energy Ei and is independent of temperature. 
Therefore we have 

D{E,) oc Z{l3*)h,{l3*) exp {(3*E,) . (107) 
Let P{Ei,P) denote the probability density of energy at /? 7^ P*. Formally we have 

P{Ei, P) = ^^D{Ei) exp {-(3Ei) . (108) 
In the above, we substitute for D{Ei) from Eq. (jl07|) and get 

P{E., f3) oc ^^Un exp [- (/? - n Ei] . (109) 
Imposing the normalization condition, we get 

^ " h,m exp [- {P - /?*) E,] ■ ^'''^ 
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Thus from a single Monte Carlo run at (3* we obtain P{E,i3) for various values of 
j3 ^ (3* . Any energy dependent macroscopic quantity as a function of (3 can now be 
obtained from the knowledge of P{E,(3). 

Another way of saying the same thing is as follows. We sample a microstate C 
from a probability distribution exp[— /3*£^(C)]/Z(/3*). We say that the microstate C is 
generated with a weight exp[— /5*£'(C)]. We are interested in evaluating the average of a 
macroscopic quantity 0{C) over a canonical ensemble dX (3 ^ (3* i.e. from a distribution 
exp[— Hence we unweight the sampled microstate and then reweight it 
at the desired temperature; the unweighting - reweighting factor for the configuration 
C is 

exp[-/5E(C)] 



exp[-/3*E(C)] 

Therefore, 



;iii) 



inx r> E^iQ(C.)exp[- {f^-^EjC,,, 



where the sum runs over the microstates sampled from a canonical ensemble at (3* . If 
0{Ci) = 0{E{Q)), then, 

where the sum runs over the energy bins. The above is identical to energy histogram 
reweighted average. 

The histogram technique is one of the oldest techniques proposed, see for exam- 
ple jZniEO]- But the technique became popular after the publication of a paper by 
Ferrenberg and Swendsen [HI] in 1988. This technique is often referred to as Ferrenberg- 
Swendsen reweighting technique and is used in almost all Monte Carlo calculations of 
statistical physics problems, especially the ones dealing with the phenomenon of phase 
transition, see for example jH2]- 

In the single histogram technique, the estimated T) is accurate only for T 
close to the reference temperature T*. By generating many histograms that overlap we 
can widen the range of T. This is called the multi histogram technique It is also 
clear that we can increase the range of T by directly estimating the density of states 
D[E). Multicanonical sampling |H1] is an early technique proposed that precisely 
does this. Multicanonical sampling is a very general technique useful and often the 
method of first choice for a variety of problems that include critical slowing down 
near second order phase transition points, nucleation in first order phase transition, 
and trapping in the metastable minima in systems with rugged energy landscapes. 
Originally this technique was proposed to tackle the so called super critical slowing 
down in the first order phase transition and calculation of the interfacial energy for 
large systems. This problem of super critical slowing down arises due to the fact 
that the Boltzmann factor suppresses configurations dominated by interfaces between 
ordered and disordered phases. As a consequence, the quality of the local update 
Metropolis Monte Carlo results deteriorates exponentially with increase of the system 
size. Entropic sampling jHJ^ is a technique conceptually equivalent to multicanonical 
Monte Carlo. 
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23 What is the basic idea behind entropic samphng 
? 



The canonical partition function is given by, 

-S{E) 



(3E 



(114) 



where S{E) is the microcanonical entropy. The probabihty for the macroscopic system 
to have an energy E in the canonical ensemble is given by, 



P{E) oc exp 



S{E) 



-(3E 



(115) 



The Metropolis algorithm helps you sample from the above Boltzmann distribution 
and construct a canonical ensemble of microstates. Instead, if we want to sample from 
an arbitrary distribution 



Pa{E){E) OC exp 



S{E) 
k-R 



a{E) 



;ii6) 



it is sufficient to impose, besides ergodicity, a detailed balance condition given by 



W{A ^ B) 



PciE) {E{A)) 



exp 



a{E{B)) -a{E{A)) 



;il7) 



This implies that a trial configuration Ct obtained by a local updating of the current 
configuration C, is accepted with a probability. 



P 



mm 



(^l,exp[-|a(E {Ct))~a{E (C)) | 



:il8) 



It is clear that if a{E) = (3E, Eq. (jll6|) is the same as Eq. (jll5|) and we recover the 
conventional Boltzmann sampling. For other choices of the function a{E), we get what 
one may call non-Boltzmann sampling. The fact that non-Boltzmann sampling can be 
a legitimate alternative to Boltzmann sampling has been recognized since the early 
days of Monte Carlo practice. However the practical significance of non-Boltzmann 
sampling was realized, only in the middle of nineteen seventies for the first time, by 
Torrie and Valleau jSUllH^ who proposed the umbrella sampling technique, a forerunner 



to all the non-Boltzmann sampling technique like the entropic sampling, multicanonical 
Monte Carlo and its several and recent variants, see below. 

Entropic sampling obtains when we set a{E) = S{E)/k-B that renders Pa(E){E), see 
Eq. ()116|) . the same for all E. But then, the entropic function is not known d priori. 
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An algorithm to build a{E) iteratively on the basis of microstates visited during short 
'learning' runs is usually recommended, see below. 

The key point is that entropic sampling gives an ensemble of microstates with 
energies uniformly distributed over its range. We can say that the algorithm generates 
a simple random walk in energy space, and hence the presence of energy barriers does 
not affect the evolving Markov chain of microstates. Since unweighting - reweighting is 
done while averaging, even a reasonably approximate estimate of the entropic function 
a{E) would suffice. The idea is that the closer a{E) is to S{E)/k-B, the flatter shall be 
the energy histogram it generates. 

Let (-Emin, -Emax) bc the energy range over which you want to generate microstates 
with uniform energy distribution. For example this range could include the region of 
the sample space where the probability is small like the interface region in a flrst order 
phase transition. Divide the range into a number of equal width energy bins. Let Ei 
be the energy of the 2— th bin. In the flrst iteration set ctj = a{Ei) = V i Carry 
out a small Monte Carlo run, of say N MCSS. The probability of acceptance of a trial 
state constructed from locally changing the current state is given by Eq. ()118|) . Since 
in the flrst iteration, = V every trial move gets accepted. But this will not be 
so in subsequent iterations. From the microstates visited by the system in the flrst 
iteration stage, calculate the histogram {hi}, where the number of microstates falling 
in the j— th energy bin is denoted by hj. At the end of the flrst iteration, calculate 
{ai} for the next iteration, as per the recursion given below, 

a?^ if hi = 

(k+i) 



ar'' = { ,„ ^ (119) 



+ T^- In [hi] otherwise. 



where the superscript (k) denotes the iteration index. 

Thus, starting from an initial array of {af'^ = V i } in the zeroth iteration, we 
successively calculate : k = 1, 2, ■ ■ -j. We continue the iterations until we 

flnd that the histogram of energy accumulated from the microstates visited during 
the iteration is approximately flat in the desired energy {Emm, Emax)- After the 
'learning' runs are over, we carry out a long Monte Carlo run, employing the set {ai}, 
for calculating the macroscopic properties. We get an entropic ensemble of microstates 
{Ci : i = 1, A^}. Calculate the canonical ensemble average of a macroscopic property 
say, 0(C) at the desired temperature T = 1/[/cb/3], from the entropic ensemble, by 
unweighting (divide by exp[— a(i?(Cj))]) followed by reweighting (multiply by exp[- 



Ei=i exp [-^EiCi) + a[E{Ci))\ 



Let ^{(3) denote the maximum of {—(3E{Ci) + a{E{Ci)) : i = 1, A^}. While reweighting, 
we have found it necessary to employ the modifled but equivalent formula below, when 
the system size is very large. 



in\ - n _ 0{Ci) exp [-mQ + o^{E{Q) - m] 

Ei=i exp [-pEiCi) + a[E{Ci)) - ^(/?)] 
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Let me repeat: since unweighting - reweighting is any way carried out, there is no 
need to estimate the entropic function {aj} accurately. Suffice if the algorithm ensures 
a relatively flat histogram of energy in the desired range. It is a good idea to increase 
the number of MCSS in an 'a update iteration' as the energy range covered becomes 
larger. The important point is that in one shot we get (O) at 'all' temperatures, in 
addition to sampling of rare events. 

I said earlier that in principle entropic sampling ^| (described above) is the same 
as multicanonical sampling technique [HI] proposed earlier. 



24 How is entropic sampling related to multicanon- 
ical sampling ? 

That entropic sampling is in principle the same as multicanonical sampling has been 
shown lucidly by Berg, Hansmann and Okamoto jHZ|- The probability of energy in 
multicanonical sampling technique is usually parametrized as. 



P{E) ~ exp 



(122) 



The parameter (3 in the above is given by the derivative of the microcanonical entropy. 

It is precisely because of this the technique is named multicanonical sampling: a mi- 
crocanonical temperature is defined for each energy. The second parameter a{E) can 
be expressed as 

a{E) = P{E) F{E) , (124) 

where F{E) is microcanonical free energy. 

The two parameters a{E) and l3{E) are related: 

a{E) = -^+P{E)E . (125) 
Differentiating the above with respect to E, we get, 

^ = E^. (126) 
dE dE ^ ' 

We have to build up the function f3{E) recursively like we built a{E) in entropic 
sampling. It is clear that the functions a{E) of entropic sampling, P{E) and a{E) of 
multicanonical sampling are related. Note that a{E) = S{E)/kB- Hence, 

a{E) = f3{E)E-a{E) 



= m. (127) 



Monte Carlo for Statistical Physics 



57 



The recursion for updating l3{E) in multicanonical sampling Monte Carlo can be de- 
rived from the recursion, see Eq. ()119|) . for updating a{E) in the entropic sampling 
Monte Carlo and is given by, 



kB AE 



X < 





if hi ^ and hi- 


-1=0 


ln(l//i.-i) 


if hi = and hi. 


-17^0 





if hi = hi_i = 




In \,^^ 1 


if hi ^ and hi. 


-17^0 



:i28) 



where AE = Ei — Ei_i, the suffix i denotes the energy bin and {hi} the energy 
histogram . The corresponding {ap^^''} follows immediately, 

In the above we set aimax = where imax is the index of the bin of highest energy. 

Indeed, one can say that the basic idea behind multicanonical Monte Carlo (and 
hence entropic sampling) of the nineteen nineties is the same as that of the umbrella 
and adaptive umbrella sampling of the nineteen seventies. But then it is the work 
on multicanonical sampling and its remarkable success in the accurate evaluation of 
the interface tension in the first order phase transition that made popular such non- 
Boltzmann sampling cum iterative techniques and initiated large scale applications and 
several new developments. These include fiat /broad histogram sampling [HH] , transition 
matrix Monte Carlo (HH], exchange Monte Carlo simulated tempering [HI] and 
cluster hybrid algorithms For a comprehensive review see [93j. Multicanonical 

sampling and its variants have been employed in a variety of studies that include phase 
coexistence in Ising models jHl], Potts spin models [^2] liquid- vapour [^n] and solid - 
solid inZj models, systems with complex free energy landscape , protein folding (HH] 
and Nematic - Isotropic transition in bulk and confined liquid crystal systems |lUUj 
etc., to name a few. 

Spanning of energy space in entropic / multicanonical Monte Carlo algorithms 
and for that matter in broad / fiat histogram methods, is achieved through a pure 
diffusive process on the one dimensional energy axis. Larger the system, wider is 
its energy range and hence longer it shall take for the system to diffuse from one 
end of the energy range to the other. The diffusive evolution of the Markov chain is 
a consequence of the penalty imposed when the system attempts to visit oft-visited 
energy bins. Also the penalty in a learning run depends on the microstates visited 
during the immediate preceding learning run. Perhaps increasing the penalty at least 
during the initial stages and applying the penalty continuously during the stochastic 
dynamical evolution of the Markov chain would speed up the convergence. To this end 
Wang and Landau |lUlj proposed a simple, easy-to-understand and easy-to-implement 
algorithm that substantially improves the performance of the multicanonical Monte 
Carlo simulation. 
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25 What is the Wang - Landau algorithm? 

In the Wang - Landau algorithm the entropic function a{E) is directly and 

continuously updated after every MCS during the stochastic dynamical process. The 
energy histogram is accumulated but not used in the updating of the entropic function. 
Instead the updating of the entropic function is carried out as follows. We define 
a convergence factor / and set it at / = /o > in the beginning. Also we set 
ai = a{Ei) = V 2 in the beginning. Start with an arbitrary microstate and construct 
a trial microstate by selecting a spin randomly and flipping it. The acceptance/rejection 
of the trial state is made on the basis of Eq. ()118|) . If the energy of the next microstate 
lies in the i-th bin, we set = + f. The histogram of visited sites is also updated; 
continue the evolution until the histogram becomes reasonably flat: Let 

■1 M 

ih) = M^h, (130) 

i=l 

denote the average number of entries in the histogram where M is the number of energy 
bins. We say the histogram is fiat if 

hi > t {h) Mi (131) 

where e is taken between 0.75 and 0.95 depending on the problem. Thus the entropic 
function and the histogram are updated after every MCS. The energy histogram simply 
plays a passive role and serves as a tool to monitor the convergence. Then we reset 
/ = /i = /o/2 and proceed. A fresh histogram is accumulated and tested for flatness. 
In the next stage / = /2 = /i/2. It is clear the convergence factor / decreases 
exponentially to zero as the number of iterations increases. In fact after twenty five 
iterations the convergence factor becomes as small as 3 x 10~^ /o whence we stop the 
simulation process, /o is usually talcen as unity. The resultant entropic function will 
be very close to the true one when normalized on the basis of known information about 
the system under simulation. For example, in the Ising model, the total number of 
microstates is 2^ where V is the number of Ising spins in the system under investigation. 
Also the ground state of the Ising model is doubly degenerate and this can also be used 
for normalization purposes. From the normalized entropic function we can calculate the 
free energy and all other desired macroscopic properties of the system under simulation. 

We can employ the unnormalized entropic function in a final production run (when 
we do not update the entropic function nor do we monitor the energy histogram) and 
construct an adequately large entropic ensemble, like we did in the production run un- 
der entropic/multicanonical Monte Carlo, see section (23). From the entropic ensemble 
we can calculate the desired macroscopic properties at all temperatures through un- 
weighting and reweighting. The Wang - Landau method has been applied to Ising and 
Potts spin models [l(THll()2j . quantum mechanical models [103^, glassy systems jl()4j 
polymer films |105j . protein folding |l()(ij . models with continuous degrees of free- 
dom jl07j and generalized to reaction coordinates |108j . 

There are a few difficulties with the Wang-Landau algorithm and there are a few 
issues that remain to be understood. The statistical accuracy of the results can not 
be improved beyond a certain value, no matter how much additional calculations you 
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perform. The relation between the convergence factor and the errors in the estimates of 
the macroscopic quantities is not clear; also it is not clear how flat the energy histogram 
should be for accurate evaluation of the macroscopic quantities. The algorithm does 
not satisfy detailed balance condition until the convergence factor / decays close to 
zero. Also during the late stages of iteration, the visited microstates do not contribute 
to the estimate of the entropic function since the convergence factor / is very small; 
this means that a lot of information is generated but not used. Consider the situation 
when we employ the Wang-Landau algorithm to estimate the density of states in a 
restricted range of energy. What should we do when the transition takes the system to 
a microstate whose energy falls outside the range? Ofcourse we reject the transition; 
we do not update either the entropy function or the energy histogram. We proceed 
with the dynamics and randomly select the next trial microstate. When we do this 
we find the calculated density of states in the boundary energy bins to be erroneous 
|lUlUlU9j . However if we update these two quantities in the current energy bin, the 
errors disappear TTU^. The reason for this is not clear. See also for some recent 
effort toward understanding and updating of the Wang-landau algorithm. 

The Wang-Landau algorithm seems to hold great promise. It would indeed worth 
the effort to put the algorithm on a more rigorous pedestal and devise means of improv- 
ing its performance through new schemes or by linking it to other existing schemes. 
For example, already Wang - Landau algorithm has been combined with the n-fold 
way |l()9j for estimating tunneling times in Heisenberg spin models and with multi- 
bondic method |112j for general discrete models. 

26 Jarzynski's equality 

Estimation of entropy or free energy from simulation or experimental data has always 
remained a tricky problem. A quantity like energy or magnetization of a macroscopic 
system can be calculated as an average over a reasonably large sample of microstates 
drawn from an equilibrium ensemble. However, for estimating entropy or free energies, 
we have to necessarily consider all the microstates accessible to the equilibrium sys- 
tem and this number is indeed very large. An early ingenious method of calculating 
entropy from data on dynamics was proposed by S. K. Ma |113j . and it is based on 
counting of coincidence (or repetition) of states along a long phase space trajectory. 
We have already seen that umbrella sampling and its variants and later improvements 
like entropic sampling, multicanonical sampling, and Wang-Landau algorithm are some 
of the useful Monte Carlo techniques that can be adapted for computation of entropy 
and free energy. In fact, the very formulation of thermodynamics and the definition of 
entropy and free energies provide us with a method of calculating these quantities, see 
below. 

Consider a classical macroscopic system in thermal contact with a heath bath at 
temperature T. Let A denote a degree of freedom of the system which can be controlled 
from outside; for example, the system can be a gas contained in a cylinder and the 
degree of freedom A can be its volume which can be controlled from outside by moving a 
piston; the system can be a spin lattice and the parameter can be an external magnetic 
field whose strength and direction can be changed. To begin with, at time t = 0, let 
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A = Ai and the system is in equilibrium with the heat bath. Then switch the value 
of A from Ai to A2. What is the change in entropy? Without loss of generality let us 
assume that the switching of A from Ai to A2 is carried out over a time duration tg at 
a constant rate (A2 — Xi)/ts- 

Consider first an ideal scenario in which the switching is carried out infinitely slowly; 
in other words ts = oo] the system is sort of dragged through a continuous succession of 
equilibrium states; during the entire process of switching, the system is continuously in 
equilibrium with the heat bath; such a process is often called a quasi-static equilibrium 
process; the process is reversible. Under such an ideal thermodynamic process, Clausius 
defines free energy (change) AF = F{X2) — -F(Ai) as equal to the reversible work done 
on the system. 

What happens when the switching process takes place over a finite time duration, 
i.e. tg < 00 ? The system would all the time be lagging behind the corresponding 
equilibrium states; the work done will depend on the particular microstate at which the 
system and the heat bath happen to be present at the precise moment the switching 
process starts. Hence the work done on the system would differ from one switching 
experiment to the other. In other words for tg < 00, the work done on the system 

is a random variable; carrying out a switching experiment and measuring the work 
done is equivalent to sampling W independently and randomly from its distribution 
p(W;ts). In other words p(W,ts)dW is the probability that the work done on the 
system during the switching process lies between W and W + dW. Let (W) denote 
the average work done on the system. 



In the ideal quasi-static equilibrium limit of tg — > 00, we have p{W,ts — > 00) = 6{W — 
Wr); W does not change from one experiment to the other and it equals Wr. The 
reversible work Wr equals AF, which is the definition of free energy (change) given by 
Clausius. 

However such reversible thermodynamic processes, very useful though, are ideal- 
izations and are not strictly realized in experiments or simulations. In general we 
have (W) > AF, with equality holding good for reversible processes. The difference 
(W) — AF is called the (irreversible) dissipative work and is denoted by Wd- Thus, to 
calculate AF from experiments/simulation, we need a good estimate of dissipation, in 
the absence of which, we can, at best provide only an upper limit: AF < (W). 

In the year 1951, H. B. Callen and T. A. Welton |115j showed that if the switching 
process keeps the system all the time very close to equilibrium, then the dissipation 

^^Reversible processes are an idealization; however, in experiments or in simulations, we can be as 
close to the ideal reversible process as possible; change A extremely slowly through a succession of 
infinitesimal increments; estimate the work done during each increment; sum these up and equate the 
result to the free energy change. There are slow growth algorithms, staged thermodynamic integration 
methods and thermodynamic perturbation methods to calculate/measure free energies. I do not intend 
to describe these techniques here and those interested can consult for example |114j . 

^^All the switching experiments are carried out with the same protocol: set A = Ai; let the system 
attain thermal equilibrium with the heat bath; start the clock: t — 0; switch A from Ai to A2 uniformly 
until the clock reads t = t^; measure the work done on the system during the switching process; reset 
A to Ai. 




(132) 
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Wd is proportional to fluctuations: 

Wd = {W)-Wr ^ (3 (133) 
where the fluctuations, a^r is given by, 

J dW W'^p{W; ts) - {W)^ (134) 



a 



w 



Recently, in the year 1997, C. Jarzynski, (116j discovered a remarkable and exact 
identity, valid even when the switching process drives the system far from equilibrium. 
Jarzynski's identity relating nonequilibrium work to equilibrium free energy differences 
is given by, 

(exp {-(3W)) = J dW exp {-f3W) p{W,ts) = exp (-/3 AF) . (135) 



Jarzynski's equality has been rigorously proved for Hamiltonian evolution, |116j 
as well as stochastic evolution |117j : its validity has been established in computer 
simulation |117j and more importantly in experiments |118j . 

Let us express Jarzynski's equality as a cumulant expansion jll9j . 



(exp(-/3iy)) = exp 



.n=l ^ 



exp(-/3AF) (136) 



where C„ denotes the n— th cumulant of W. The cumulants and the moments are 
related to each other. The n-th cumulant can be expressed in terms of the moments of 
order n and less. Given below are flrst four cumulants expressed in terms of moments. 

C, = {W), 

C, = {W') - (W)' = , 

Cs = (W^) -3{W^){W) + 2{Wf , 

C4 = {W^) - 4{W^) {W) - 2{WY + 12{W^) {W}^ - 6{W)^ . (137) 
From the cumulant expansion, see Eq. (jl36p . we get, 

AF = (W^) - ^ + V il^l!^ (138) 
2 ^3 ^- 



^^It may be noticed that since the exponential function is convex, 

(exp(-/3W^)) > exp(-/3 (W)) 

which in conjunction with Jarzynski's identity imphcs that, 

exp{-pAF) > exp(-/3(T/F)) 
~PAF > -/3{W) 
AF < (W) 

which is a statement of the second law, if we modify the switching protocol and let the system 
equilibrate at A = A2 at the end of the switching process. In this sense, proof of Jarzynski's identity 
is a proof of the second law 
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Consider a quasi-static equilibrium switching process; p {W) = 6 {W — Wr), by defini- 
tion. Then, in the above, only the first term (of the cumulant expansion) is non-zero. 
We get the definition of free energy as given by Clausius: (W) = Wr = AF. 

Now consider a switching process, during which the system remains very close to 
equilibrium; it is reasonable to expect the statistics of W to obey the Central Limit 
Theorem ; hence p{W) shall be a Gaussian; for a Gaussian, all the cumulants from 
the third upwards are identically zero; hence in the above expansion only the first 
two terms survive and we get AF = {W) — PC2/2 = {W) — This result is 

identical to the fluctuation dissipation relation of Callen and Welton |115j . However, 
if the switching process drives the system far from equilibrium, the work distribution 
would no longer be Gaussian and we need to include contributions from higher order 
cumulants to calculate the dissipation Wd and hence free energy: AF = {W) — Wd 

Jarzynski's equality is helpful to experimenters for measuring free energies; the 
experimenters do not any more need to keep the system at or close to equilibrium 
during the switching process. Jarzynski's identity is a powerful tool for calculating free 
energy, when suitably incorporated in a Monte Carlo or Molecular dynamics programs. 
A dynamical ensemble of nonequilibrium trajectories all starting off from an equilibrium 
ensemble is all that is required for free energy computations. 

Jarzynski's equality forms a part of recent exciting and important developments in 
nonequilibrium statistical mechanics. These new developments, related to each other 
and derivable one from the other, constitute what has come to be known as Fluctua- 
tion Theorems |56llll61lll7[lll8lll21j and have led to fresh insights into nonequilibrium 
statistical mechanics, reversible and irreversible processes and the second law of ther- 
modynamics. 

27 Epilogue 

All things, good and bad, must come to an end; so must, our Monte Carlo journey, 
good or bad. I have taken you, randomly though, through the lanes and by-lanes of 
the Monte Carlo metropolis, confining to those regions relevant for statistical physics 
applications. Table (6) lists what I consider as milestones in Monte Carlo statistical 
physics. 

In the first leg of our journey we looked at ensembles in general and Gibbs ensembles 
in particular. We saw details of microcanonical ensemble that describe isolated system; 
of canonical ensemble that models closed system; and of grand canonical ensemble, 
useful in the study of open systems. Then we took up the issue of random numbers 
that fuel the Monte Carlo machine. To our great discomfort we discovered that even 
today we do not know when can we call a sequence of numbers random except when 
we know of its origin. This of course did not deter us. We took a pragmatic attitude 
and settled for pseudo random numbers. We saw how to generate them and how 
to test them. We learnt how to use them to generate ensemble of random numbers 
representing the desired distribution - the so called random sampling techniques. We 
saw explicitly some techniques that help sample from the exponential and Gaussian 
distributions. For purpose of illustration we saw how to calculate an integral by the 
method of Monte Carlo. We learnt of Monte Carlo error bars and how to estimate 



Monte Carlo for Statistical Physics 



63 



Table 6: Milestones in Monte Carlo Statistical Physics 



When? 


What happened? 


Who did it and where do I find it? 


1951 


Linear congruential (pseudo 


D. H. Lehmer [21 




random number) generator 




iyoo 


Metropolis algorithm 


N. Metropolis, A. W. Rosenbluth, 






M. N. Rosenbluth, A. H. Teller and 






E. Teller |4Sj 


1963 


Glauber dynamics 


R. J. Glauber ^ 


1964 


Hand Book on Monte Carlo 


J. M. Hammersley and D. C. Hand- 




Methods 


scomb jSni 


1969 


Random cluster model 


P. W. Kasteleyn, C. M. Fortuin (HS! 


1975 


n-fold way 


A. B. Bortz, M. H. Kalos and J. L. 






Lebowitz 


1976 


Cluster countmg algorithm 


T TT 1 1 T~\ T /' 1 1^^^ 

J. Hoshen and R. Kopelman |68j 


1977 


Tlmbrella, Samnliuf? 


G M Torrie and J P Valleau IHHI 


1980 


Ising critical droplets 


A. Coniglio and W. Klein |64i 


1987 


o J TIT ^ u. 1 

bwendsen-Wang cluster algo- 


T~» TT O 1 1 T O "XHT f^^l 

R. H. Swendsen and J.-S. Wang [62] 




rithm 




1988 


Histogram reweighting 


A. M. Ferrenberg and R. H. Swend- 






sen Plj 


1 Q8Q 


VVUili CiUolci di^UilLiilli 


TT WnlflF Pt^ 


1991 


Multicanonical Monte Carlo 


B. A. Berg and T. Neuhaus IH 


1993 


Entropic sampling 


J. Lee IHSl 


1995 


Absorbing Markov Chain 


M. A. Novotny 


2000 


A Guide to Monte Carlo Sim- 


D. P. Landau and K. Binder [1221 




ulations in Statistical Physics 




2001 


Wang-Landau algorithm 


F. Wang and D. P. Landau !TUT| 
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them. 

Then we took up the issue of importance samphng and dealt with it in somewhat 
greater details mainly for two reasons. Importance sampling is an important topic in 
Monte Carlo theory and practice. Almost all Monte Carlo programmes employ impor- 
tance sampling in some form or the other and with varying degrees of sophistication. 
The second reason is that often I find students have difficulties in comprehending what 
and how does importance sampling achieve what it purports to achieve. One way is 
to emphasize that importance sampling reduces the variance of the desired statisti- 
cal variable whose average is what you want to calculate by Monte Carlo simulation. 
Needless to say that the technique should leave the average unchanged. 1 have taken 
this view and elucidated it through a simple example for which all the quantities can 
be calculated analytically. 

Often we need to resort to importance sampling as a matter of necessity; for oth- 
erwise, any finite Monte Carlo sample generated should rarely contain microstates of 
interest to the phenomenon under investigation. Metropolis importance sampling is an 
example. A randomly chosen microstate belonging to a closed system is most likely to 
be of high energy and hence contributes negligibly to the canonical partition sum. 
Hence we have to resort to importance samphng. The Metropohs algorithm generates 
a Markov chain whose (asymptotic) distribution is the equilibrium canonical distribu- 
tion at the desired temperature. Indeed it is the Metropolis algorithm that launched 
the Monte-Carlo-for-statistical-physics business; it remains to date the best algorithm 
in this field. We took Ising model as an example to illustrate the Metropolis algorithm, 
continuous phase transition, critical exponents, finite size effects and finite size scaling. 

When the actual dynamics slows down, the corresponding time-driven Monte Carlo 
simulation also becomes slow. Bortz, Kalos and Lebowitz suggested shifting of the 
(slow) dynamical information to the time axis and derived the so-called n-fold way; 
this event-driven algorithm is rejection free and helps speed up the computations. 
Novotny generalized this technique to Monte Carlo with Absorbing Markov Chain 
(MCAMC). However critical slowing down in continuous phase transition calls for 
special techniques. We discussed the cluster algorithm of Swendsen and Wang, of 
Wolff, and of the one based on invasion percolation etc. Wolff algorithm became 
famous for quite another reason also. It was while studying its performance that 
Ferrenberg, Landau and Wong discovered how dramatically the ' pseudoness ' of pseudo 
random numbers can affect your Monte Carlo results and we had a glimpse of the well 
documented R250 story. 

While on clusters we learnt of the celebrated Hoshen-Kopelman cluster counting 
algorithm; we also discussed blocking technique that helps estimate statistical error 
bars from correlated data and the histogram technique often employed in the study 
of systems very close to critical points. The histogram technique helps you calculate 
macroscopic properties of a system at temperatures different but reasonably close to 
the simulation temperature by suitable reweighting. 

A first order phase transition is associated with the so called supercritical slowing 



^'*thc microcanonical entropy increases rather rapidly with increase of energy; more the energy the 
more is the number of ways of distributing it. 
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down; we found how non-Boltzmann sampling techniques like entropic sampling/multicanonical 
Monte Carlo would help. Then we saw of the Wang-Landau algorithm which improves 
the performance of the multicanonical Monte Carlo methods. We ended our journey 
with a discussion on the recently proposed Jarzynski's equality and the promise this 
technique holds for computation of equilibrium free energies from nonequilibrium sim- 
ulation / experiment . 

As you would have rightly noticed, the journey has been somewhat random and in 
places rather rickety. I made no serious efforts to be otherwise; it had never been my 
intention to take you through all the topics in this vast and growing field. Nor did I try 
to place uniform emphasis on the various topics covered in these notes. I just picked 
up topics that caught my fancy and on which I have some hands-on experience and 
discussed them in a pedagogic style. Nevertheless I believe these notes cover quite a 
bit of ground and would serve as an introduction to this fascinating field. 

The topics we have not visited include simulation of microcanonical and other en- 
sembles, quantum Monte Carlo, renormalization employing Monte Carlo, replica Monte 
Carlo, Monte Carlo for vector and parallel computers, off-lattice models, flat/broad 
histogram sampling, transition matrix Monte Carlo, exchange Monte Carlo, simulated 
tempering, cluster hybrid algorithms and the whole field of simulation of nonequilib- 
rium phenomena. Perhaps we can visit these topics during another journey on another 
occasion. Until then bye. 

The recent comprehensive book by Landau and Binder |122j should provide an 
excellent guide. 
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